Supercomputer Built With 8 GPUs
FnH writes "Researchers at the University of Antwerp in Belgium have created a new supercomputer with standard gaming hardware. The system uses four NVIDIA GeForce 9800 GX2 graphics cards, costs less than €4,000 to build, and delivers roughly the same performance as a supercomputer cluster consisting of hundreds of PCs. This new system is used by the ASTRA research group, part of the Vision Lab of the University of Antwerp, to develop new computational methods for tomography. The guys explain the eight NVIDIA GPUs deliver the same performance for their work as more than 300 Intel Core 2 Duo 2.4GHz processors. On a normal desktop PC their tomography tasks would take several weeks but on this NVIDIA-based supercomputer it only takes a couple of hours. The NVIDIA graphics cards do the job very efficiently and consume a lot less power than a supercomputer cluster."
By what benchmark is eight of the NVIDIA GPUs in the 9800 GX2 more powerful than 300 2.4 GHz C2Ds?
My blog
They didn't have enough dough for 9.
Am I the only one seeing those alternative uses of GPUs as some kind of re-birth of the Amiga design?
So basically the drivers screw up the computer and cause a invalid pointer reference of some sort every few hours :P
This article makes it seem like it is possible to use the GPUs as general purpose CPUs. Is that the case? If so, why doesn't NVIDIA or especially AMD\ATI start putting its GPUs on motherboards? At a ratio of 8:300, a single high-end GPU seems to be able to do the work of dozens of high-end CPUs. They'd utterly wipe out the competition. Why haven't they put something like this out yet?
Ok, probably a paid NVIDIA ad placement, but check TFA anyway (and even if you don't read, you gotta love the case). It looks like heat generation is one of the biggest problems--sweet.
...and at 4000EUR, that comes to what (rolls dice, consults sundial) about $20000 American?
I like this too:
The medical researchers ran some benchmarks and found that in some cases their 4000EUR desktop superPC outperforms CalcUA, a 256-node supercomputer with dual AMD Opteron 250 2.4GHz chips that cost the University of Antwerp 3.5 million euro in March 2005...
"Beware of bugs in the above code; I have only proved it correct, not tried it." -- Donald Knuth
In other news Graphics cards are good at . . . graphics.
I can't imagine that it is a coincidence that this comes along just as Nvidia are crowing about CUDA, or that the resulting machine looks like a gamer's dream rig.
While there is ample crossover between hardware enthusiasts and academia, anyone soley with the computation interest in mind probabyl wouldn't be selecting neon fans, aftermarket coolers or spend that much time on presentable wiring.
They are useful for applications that can be massively parallelized. Your average program can't break off into 128 threads, that takes a little bit of extra skill on the coder's part. If, for example, someone could port gcc to run on the GPU, think of how happy those Gentoo folks would be :) (make -j128)!
Obligatory blog plug: http://www.caseybanner.ca/
... 3D Realms announced this as the minimum platform requirements to run Duke Nuke'em Forever.
__ Someday, but not this morning, I'll finally learn to use the preview button.
It does not run Linux.
Can it? Anybody?
Help stamp out iliturcy.
No; if you read all the way to the end, you can see where they discuss the limited specific "general" programs that currently support this kind of thing. Namely, folding@home (on ATI cards) and maybe Photoshop in the future. The tomography software they use is likely their own code, is graphics-heavy, and is tailored for this set-up.
For information on their current HPC platform checkout http://www.nvidia.com/object/tesla_computing_solutions.html FWIW I don't think there would be that big of performance advantage of putting the GPUs on the motherboard, infact you'd probably actually get a performance decrease if you UMA'd the memory. With discrete boards each GPU has it's own framebuffer resulting in higher memory bandwidth.
Something that can play Crysis!
GPGPU only works with highly parallelizable problems.
Why not just buy a premade Tesla system from nVidia and avoid the heating problems?
In this instance for this particular problem domain GPUs outperform CPUs because of the calculations they are designed to handle. This is not an indicator that GPUs outperform CPUs in ALL problem domains though.
this is an example of acceleration architecture. Anyone who have used FPGAs knows that. Ofcourse, making sensational news is a too common thing on /.
They called me mad, and I called them mad, and damn them, they outvoted me. -Nathaniel Lee
Does it run linux or Crysis??
It is possible to solve non-graphics problems on graphics cards nowadays, but the hardware is still very specialized. You don't want the GPU to run your OS or your web browser or any of that; when a SIMD (single instruction, multiple data) problem arises, a decent computer scientist should recognize it and use the tools he has available.
Also, this stuff isn't as mature as normal C programming, so issues that don't always exist in software that's distributed to the general public will crop up because not everyone's video card will support everything that's going on in the program.
Pardon the italics, but I was impacted by the killer slant of this posting.
For specific kinds of calculations, sure, GPGPU supercomputing is superior. I would question what software optimization they had applied to the 300 CPU system. Apparently, none. Let's not sensationalize quite so much, shall we?Invenio via vel creo
The idea is to replace the CPU with a GPU and use code morphing to convert x86 code to run on the GPU. Think about it, if one GPU is as powerful as 37 Core Duo CPUs why bother having a CPU on the motherboard at all?
It is also not difficuult to find other tasks where, e.g., FPGAs peform vastly better than general-purpose CPUs. That does not make an FPGA a "Supercomputer". Stop the BS, please.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Probably a combination of the heat/power budget for a CPU and the fact that programs have to be written specifically for GPUs.
Check out the GPGPU (General Purpose GPU) project:
http://www.gpgpu.org/
gpus are good at doing what they are programmed for, calculations with floating point numbers.
They arent very good at doing multiple different things like a cpu. Every time you want to do something different you have to reprogram it, which would suck for normal use on a desktop.
This article makes it seem like it is possible to use the GPUs as general purpose CPUs. Is that the case? If so, why doesn't NVIDIA or especially AMD\ATI start putting its GPUs on motherboards? At a ratio of 8:300, a single high-end GPU seems to be able to do the work of dozens of high-end CPUs. They'd utterly wipe out the competition. Why haven't they put something like this out yet?
Simple: This is not a supercomputer at all, just special-purpose hardware running a very special problem. For general computations, GPUs are pretty inferiour.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
I love this picture: http://fastra.ua.ac.be/en/images.html
Between the massive brick of GPUs and the massive CPU heatsink/fan, you can't see the mobo at all.
I don't understand completely what you mean, but:
* Only some people need the extra speed.
* The tasks that can be accelerated this much belong to a specific domain of computation.
* NVIDIA're just starting.
Jag pratar lite svenska.
It is obvious that, if a computer is using GPUs exclusively, it is limited to vector or data parallel processing. And it is no surprise that it is being used by an outfit that specializes in visual processing, which is ideally suited to data parallel processing. Change the benchmark program to code that has a lot of data dependencies and this "supercomputer" will choke to a crawl.
Wave of the Future? Yes*. Revolution in computing? Not quite.
The GPGPU scheme is, after all, a re-invention of the vector processing of old. Vector processors died out, however, because there were too few users to support. Now that there's a commercially viable reason to make these processors (PS3 and video games), they are interesting again.
The researchers took a specialized piece of hardware, rewrote their code for it, and found it was faster than their original code on generic hardware. The problems here are that you have to rewrite your code (High Energy Physics codebases are about a GB, compiled... other sciences are similar) and you have to have a problem which will run well on this scheme. Have a discrete problem? Too bad. Have a gigantic, tightly coupled problem which requires lots of inter-GPU communication? Too bad.
Have a tomography problem which requires only 1GB of RAM? Here you go...
The standard supercomputer isn't going away for a long, long time. Now, as before, a one-size-fits-all approach is silly. You'll start to see sites complement their clusters and large-SMP machines with GPU power as scientists start to understand and take advantage of them. Just remember, there are 10-20 years of legacy code which will need to be ported... it's going to be a slow process.
In all likelihood, if they tried too hard to advertise their speed advantage, they would be called dishonest because most programs would not exhibit a speedup.
Contribute to civilization: ari.aynrand.org/donate
Fortunately, Nvidia provides a CUDA version of the basic linear algebra subprograms, so even if your software is hard to port, you can speed it up considerably if it does some big matrix operations, which can easily take a long time on a CPU.
But how many FPS does it do with Duke Nukem Forever? --Toll_Free
They are comparing their system against normal computers, I'd be interesting to see a benchmark against a vector computer, like, eg. NEC SX9
Because for 95%+ of the problems a general purpose computer tackles GPU's would suck. It's only in very special cases that GPU's outperform CPU's. Thus, your idea is a poor one.
But the motherboard didn't have enough PCIe x16 slots!
Precisely. But that happens to be one of the areas where more performance is still needed.
You don't need a super-duper CPU for text editing, that's for sure. For most of the tasks people do on computers, we have had CPU enough for the last 15 years or more. But where we still need more CPU happens to be mostly in tasks that ARE massively parallel, for instance, physics simulations, of which you will find several examples in the nVidia site.
I'm following this technology with much interest, and I think I will have a major upgrade in my home computer soon. My old FX-5200 card has been more than enough for my gaming needs, but now I have a new reason for upgrading.
The system uses four NVIDIA GeForce 9800 GX2 graphics cards, costs less than 4,000 EUR to build
What's more crazy: calling something this inexpensive a supercomputer, or 4 video cards costing a freaking 4,000 EUR.
Nvidia offers an external GPU solution specifically for "deskside supercomputing", the Tesla D870. It has only 2 cores with 1.35GHz each, apart from it being a bit more expensive, I wonder how it compares (you can connect several to a PC).
"I love my job, but I hate talking to people like you" (Freddie Mercury)
AFAIK, in many (all?) ordinary consumer graphics cards, minor mistakes by the GPU are tolerated because they'll typically result in (at worst) minor or unnoticable glitches in the display. I assume that this is because, to get the best performance, designers push the hardware beyond levels that would be acceptable otherwise.
Clearly if you're using them for other mathematical operations, or to partly replace a standard CPU, such mistakes might *not* be acceptable.
"Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
What a pity that they are running Windows XP. This computer could be the first to see running Vista at an acceptable speed.
They are using graphical processors to process graphics. Truly revolutionary. Who woulda thunkit?
Excuse me, but please get off my Pennisetum Clandestinum, eh!
Ok, so if GPUs are far more powerful than CPUs why doesn't nVidia just create a CPU based on their GPU? AMD & Intel would be in serious troubles then. And if 300 CPUs can be overtaken by just 8 GPUs, why the heck people haven't been doing it before? No one got such idea? Yeah, right... Something stinks here...
My Windows is NOT slow, it's special!
*Insert Beowulf cluster meme*
In Soviet Russia, everything runs linux.
I'm extremely curious to know where the performance bottleneck is in this system. Is it memory bandwidth? PCIe bandwidth? Raw GPU power? Depending on which it is, it may or may not be very easy to improve upon the price/performance ratio of this setup. Given that the work parallelizes very easily, if you could build two machines that are each 2/3 as powerful and each cost 1/2 as much, that's a huge win for other people trying to build similar systems.
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
Since when have "vector processors died out"? The "Earth Simulator" for example used the NEC SX-6 CPU, currently the SX-9 is sold. Vector processors never died out and were in use for what they are best at. The GPU and the Cell are no match for either processor, first they both are only fast in single precission mode and much slower when they have to do double precission (the second generation of Cell is better at double precission) and they both have a weak memory subsystem when compared to a true VPU. It is slow and they can only use small memories. As far as I know the Cell can't even chain it's VPUs, something which was standard since the Cray-2 on VPUs.
I have the impression that the graphics cards only do single precision floating point math, just like the CELL. Makes sense since both were developed for gaming, and single precision is sufficient for graphics. .
For many scientific problems however, double precision is a must have. I guess that's one reason why GPU based systems aren't in the
I wish people were more explicit about what kind of FLOP they are talking about when claiming TeraFLOPs performance.
They're just using them like FPUs that have been available for years. This is absolutely nothing new. I know there was an article a while back about hacking GSM encryption using a computer with multiple FPUs installed.
This might be a really stupid question but a graphm halfway down the page, shows that the overclocked FASTRA outperforms the non-overclocked FASTRA. Is this a mistake? I was always under the impression an overclocked system would have a better performance...?
Sure, this machine is for floating point calculations. But for general-purpose computing, I think one of these setups would be more efficient: "24-core Linux cluster in IKEA cabinet" http://helmer.sfe.se/
Is that in single or double precision?
Last I checked these "supercomputers" could only do single precision in hardware.
How well would this one do in a standard double precision Linpack benchmark?
The main post does not contain a direct link to the project webpage: http://fastra.ua.ac.be/, which contains a lot more info and explanations.
Mind the frickin' laser...
Wouldn't be the first time a supercomputer was developed with a specific task in mind.
---- Booth was a patriot ----
...why doesn't NVIDIA or especially AMD\ATI start putting its GPUs on motherboards [as the main processor]? I think you part\ially answered your own question there. For nVidia, I bet they don't want to deal with the x86's legacy or intel's henchmen. Ultimately, the 8:300 ratio probably has something to do with limitations of the x86 CPU architecture, and Windows compatibility.Motherboards with GPUs built-in are nothing new, but I'm assuming you meant a general purpose GPU in place of a regular CPU.
The point of this is that if your application suits it, this is a very cheap way to get supercomputer performance without paying for your own supercomputer (cluster) or time on an existing one.
No doubt about it. In spite of my admittedly negative criticism, I applaud these guys because I think this shows the amazing potential of multicore parallel computing to bringing supercomputing power to the desktop and even to the laptop and the cellphone. However, this potential will not arrive unless we can find a way to design a universal multicore processor architecture that is at home in all possible parallel environments, not just vector parallel systems. IOW, we need a parallel processor that can handle anything we can throw at it with equal ease. Unfortunately, both the industry and academia are pushing the field toward so-called heterogeneous processors, hideous monsters that will be a nightmare to write code for. Check out Nightmare on Core Street for good explanation of the multicore crisis and how it can be solved.
The key to why they were able to do this is the CUDA C programming environment:
http://www.nvidia.com/cuda
Which is faster? A Lamborghini or a 5-ton flatbed truck?
Depends on what you're after! If you are trying to get yourself from point A to point B, the Lamborghini is the obvious choice. But if you need to move 4.5 tons of stuff from point A to point B, the Lamborghini would suck ass when compared to the flatbed truck.
It's just a question of what you are trying to accomplish. There is no absolute framework for "power" to solve problems, even if you define it fairly narrowly. For example, let's talk about 'pattern matching': A free database (like PostgreSQL) on cheap hardware can search through millions of records to deliver a query result in a tenth of a second. In that respect, Postgres is WAY faster than, say, the human brain. But the human brain will KICK ASS over just about any other technology out there in deciding whether or not a particular image contains a cat.
Use the right tool for the job, and you'll be amazed at the results. That 8 GPUs handily outperform 512 CPU cores at a specific task is not surprising - the GPUs are designed from the beginning to solve the kind of problem that's needed!
Personally, I'm surprised as to why there hasn't been more development behind the FPGA: are they just expensive?
I have no problem with your religion until you decide it's reason to deprive others of the truth.
There are tons of other folks getting just as phenomenal results using NVIDIA's GPUs:
http://www.nvidia.com/object/cuda_showcase.html
Others are beating large supercomputers with just two GPUs.
Simply put, you do not understand why this is creating such a stir until you understand that this segment of the industry is trying to make your quote obsolete.
The guys in Antwerp have probably got themselves the greater number crunching power, but reconstruction of tomographic images has been done using similar multi-core hardware. See the following (pdf alert) from the University of Erlangen, which uses a cluster of PS3s for a great use of commodity consumer hardware http://www.google.co.uk/url?sa=t&ct=res&cd=1&url=http%3A%2F%2Fwww.imp.uni-erlangen.de%2FIEEE%2520MIC2007%2FKnaup_Poster_M19-291.pdf&ei=t_FBSKnZKoie1gbh2Y23Bg&usg=AFQjCNG7vNGmMM2hBrYdVKbwZAJZL0oS3Q&sig2=sEdlnPROC77CZ_KJ5OOgrg .
What this makes me thinking about it the marketing campaign that 3DFx used back when they launched their Voodoo 5 series ("So powerful it's kind of ridiculous").
Most of the TV spot started explaing how scientist could save humanity with GFLOPS-grade chips. But then humorously, the TV sport announces that they decided to play game (often with hilarious effect on the various "dreams of a better humanity" that the first half of the spot showed).
In a funny twist of things, it's the exact opposite that happened actually : nVidia and ATI were aiming for the highest possible frame rate in games, but then they decided to do science and did CUDA & Brook.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
An ATI or nVidia GPU as main processor is completely out of question. These things aren't even able to *call* a function (everything is inlined at compile time), not to mention that everything is ran by a SIMD engine behind the scene which mean that all the processing unit must all actually run the same code at the same time.
Intel on the other hand are touting their future Larrabee as being completely compatible with the x86 instruction set. The whole thing, according to them, should behave like a big many-core in-order CPU - think of Tilera's Tile64, but with overclocked Pentium1 + SSE (hum... probably the Larrabe is going to be a many core silverthorne ?)
They have mentioned the possibility of producing Larrabee that use quickpath to communicate with the outside and fit into a server socket. They could even be it used alone and boot the OS by themselves, thank to the compatibility with the original x86 instruction set.
Of course, that's only marketing, there are no Larrabees out yet. We shall see...
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Impressive performance yes... but a supercomputer? Hardly!
Next article will be "hey, I overclocked my Q6600 and now it's a supercomputer."
Realistically this is a rather normal computer with a rather fast coprocessor.
The major different between CPU's and GPU's is that CPU's have to handle the effects of different branch conditions. One of the optimizations that CPU's have been designed for, is the combination of pipelining and dual path evaluation. Because the CPU is pipelining instructions, it has to evaluate both outcomes of each conditional instruction in parallel with the actual condition, and then select the actual outcome once it is actually known. Calculating the condition first followed by the outcome, would reduce performance by a half.
As GPU's are dedicated to floating point data, they don't have the space to do this kind of logic.
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
On a more serious note, they shouldn't have elevated their monitor. Generally, the top of computer monitors should be at the eye-level of the user. From the picture, it doesn't seem so. This would cause more strain on the user's neck as he/she needs to look up more than looking down.
The way our neck bones are structured, makes looking up more strenuous than looking down. Hence, it is more comfortable to look downwards than upwards.
w00t
have you built many computers? with all of the systems i've put together, bundling wires is the most reasonable way to make the hardware manageable. additionally, the use of aftermarket cooling devices is often essential when you are planning to run your hardware at maximum capacity 24/7. if the plan for the hardware is something that the manufacturer never envisioned (which is clearly the case), then you have to find appropriate solutions. in the future, you may want to look a little more closely before jumping straight into harsh criticism.
personally, i think their heat issue could be solved with 1/2" tubes and water.
Thank dog I hope NOT! .1 pixel width on the screen which is what modern day GPUs do.
GPU's are not nearly as accurate as CPU's are. Why? CPU's don't have the option of fudging the math because your eyes won't see a
Dog help us if GPU's move to the primary chip slot on motherboards... it's bad enough we accept low quality software, who wants inaccurate hardware running low quality software?
> Check out my blog post Nightmare on Core Street ....
There, fixed that (IMNSHO) for you...
It's not about you and you can't stand it, can you? It hurts, doesn't it? That's too damn bad. This is just the beginning and the pain has only begun. I suspect you're an old computer geek. Time for you to retire. You're no longer needed. ahahaha...
Oh I understand perfectly why there is a 'stir', and I also understand why they aren't anywhere close to replacing general purpose CPU's yet. Not even close. I never said it wasn't interesting use for the technology.
For tomography reconstructions one typically needs SVD (Singular Value Decomposition) which are of n^2 sizes. E.g. consider a reconstructable image of 128 x 128 = 16384 pixels total, this needs (temporary) processing matrices of size 16384 x 16384, which, as you can see, grows rather fast with even moderate reconstruction grids.
I'm no great coder, nor matemathician, but wouldn't a GPU be perfect for simulations such as this one?
I imagine there is a lot of vector calculations going on, which GPUs are very good at. Maybe the communication between memory and GPU would be too slow.
Someone with more knowledge, please share your insights!
Are you a grammar Nazi? I'm trying to improve my English; please correct my errors!
Looking at the last sequence of three posts in this thread, it seems obvious whose blood is boiling. LOL
Most supercomputers are optimized to perform 32 or 64 bit floating point operations. Currently broaching the petaflop boundary. The reason is that most established uses of supercomputers rely on mathematical computation: Fourier Transforms, Linear Algebra, etc., which utilize floating point operations. Visualization and imaging have been done on GP supercomputers, but not very economically. There's constant buzz in the HPC community about FPGAs and GPUs, but CPUs remain the most flexible hardware to do the core job of HPC which is floating point computation.
I wouldn't be surprised to see HPC centers adding (as budgets permit) GPU based solutions for vizualization, imaging, ray tracing, etc. FPGA solutions appear, IMHO, to be a bit farther away from use. The biggest hold up for both will be the programming interfaces. HPC programming interfaces are just emerging for GPU solutions. It remains to be seen which will succeed.
Invenio via vel creo
...memory bandwidth? PCIe bandwidth? Raw GPU power?
You can safely assume that all problems of this nature are memory-limited because the cyclic rate of essentially every computer device, starting with the 486 and continuing through the CPUs and GPUs of today, is higher than that of memory. Thus the clock multiplier everyone talks about nowadays. The disparity is an artifact of the physical constrains of making each type of device with any given level of technology, and processing cleverness like SIMD/MIMD and hand-tuning like the Alpha can't make up the difference. The only way you can realistically escape the memory bottleneck is if you happen to have job-level parallelism (BOINC, Folding@home), which is to say your computation requires a very long sequence of transformations on many small chunks of data that don't depend on each other at all. Of course, there are lots of situations where this isn't the case.
Depending on which it is, it may or may not be very easy to improve upon the price/performance ratio of this setup.
Memory bandwidth doesn't matter as much as you might think. The cards have 512MB/GPU * 2GPU/card * 4 cards = 4096mb of device memory. They bought twice that much system memory, but they have to run an OS and their application, so maybe their data is small enough and parallel enough to fit in their GPUs' memories, and maybe it isn't. Let's say it's too big, for starters:
a) The motherboard in question runs 4 PCIe slots at 8x, giving a theoretical limit of 2GB/s for each slot (thus card). The very large transfers they're probably making can probably approach something like 90% of this, so we'll say each card would like 1.8GB/s.
b) That motherboard with that CPU and that RAM gets real-world main memory bandwidth of ~6GB/s.
So: 4 * 1.8 = 7.2, and we have about 6 (conservatively). Even if we use these numbers, it's pretty close, but remember the big concept that (probably) makes it meaningless:
You can hide bandwidth deficiencies if you can figure a way to always stay busy on another chunk while you're doing data transfer on each individual chunk. In the case of the cards and PCIe 8x an memory buses, you should aim to process exactly 3 times longer than it takes to transfer. That's as good as it gets; you'll never be able to process data faster than you can transfer it.
The same concept applies to a GPU itself; you use the same back-buffer concept. I remember reading a paper detailing the minutiae of getting a Geforce 8800 GTX to do a rather academic (i.e. rhetorical) galactic dynamics simulation as proof-of-concept. The concept is the same as in the system-level problem but GPU architecture is far more complex, and their extremely high performance came at the cost of having their code work on no other GPU even within the same family. Stanford's Folding@home releases for the ATI GPUs leave some performance on the table, but they're family-specific and a huge performance coup despite it.
These guys have 64GB/s to device memory, but device memory is "small" (at 512MB) and they have only about 1/32 that bandwidth to "big" memory (say, 7GB). I'm guessing they knew they could stay busy at least a bit longer than that, otherwise indeed they could do just as well with single-GPU cards as with dual, and using only 2 cards at 16x instead of 4 at 8x. That would save $1500 on graphics cards, $200 on power supply, and $200 on a case (normal 7-slot case and "normal" rated PSU).
...if you could build two machines that are each 2/3 as powerful and each cost 1/2 as much, that's a huge win for other people trying to build similar systems.
You're describing "GPGPU clustering", and although there has been some work on this concept, it's not very attractive because it's even more difficult to tailor your application to the hardware, and you almost always need Infiniband (at least), which is so expensive that you're rapidly approaching the realm of speci
will it run Vista?
...But will it run Vista?
I do some intense video editing and the 'rendering' phase takes forever. I have a dual core AMD, fast FSB...etc.. still slow. But wouldn't it make sense to have a GPU (or 2 or ..) just for those sort of number crunches ?? I would imagine rendering can be parallel processed..?
I would think there would be a market for an add-in board that can crunch numbers like crazy and would work with the common video software (Vegas, Premiere, etc.) Maybe even have a direct socket for SATA drives so as to avoid the bus.
Home video editing has come down in price so much that I see a huge future market there.
Comparing somthing to an old supercomputer is stupid. At any given time a s supercomputer is faster order of magnitude machines (60 to 600 teraflops in mid-2008).
Actually, that's PCIe 2.0. They should be getting double that bandwidth.
Yup; missed that. So ~3.6GB/s is quite a bit nicer.
A lot of graphics work is considered to be "embarrassingly parallel", such that gigabit ethernet is more than adequate as an interconnect. This is what movie studios use for their render farms.
That's job-level parallelism, since it
requires a very long sequence of transformations on many small chunks of data that don't depend on each other at all.
I remember doing farm rendering over 10base-T, and I think some people were doing it even earlier over slower networks. I was thinking of that very example when I wrote that "long sequence of transformations on many small chunks of data that don't depend on each other" bit, actually. It's a perfect example of being able to stay busy for much longer than it takes to transfer the data. Each final result depends only on a subset of earlier calculations. If each calculation (at every step, not just at the end) depends on all the calculations before it, then you not only have to complete the previous computations (easy, given our beefy processors), but you also have to send the results to the processors, because those results are the new data (hard, given our slow interconnects starting with memory and going to bridges all the way down to networks).
As I tried to cover above,
Saying that a task parallelizes "very easily " doesn't mean much: there are lots of ways that a task might clearly parallelize, but those ways have very disparate properties and therefore hardware prescriptions.
Parallelism isn't just a spectrum, with "not" parallel and "embarrassingly" parallel at opposite ends. In the case of pretty much any physical simulation, such as n-body or fluid/thermodynamics simulations which evolve over time and are done with mesh or particle integration techniques, each step of the calculation depends on the state of the entire system at the previous time slice, so the calculations at each time slice require the results of every calculation that comes before it (not just "some" of them that affect it). Ray-tracing an image doesn't have this problem because the view at one perspective doesn't depend on the view from any other perspective, even if the view from every perspective is very time-consuming for you to calculate. So there are really 2 more considerations: the data-interdependency of the problem, and the computing time relative to the transfer time. If we can do our transformations as fast as our interconnect (network or memory bus) allows, then it will become bandwidth-limited.
With ray-tracing, you can get as much computation from a given amount of scene data as you want, which illustrates why low-resolution renders are faster on a single computer on which they fit, but you can scale the resolution up arbitrarily to the point where no computer can process its chunk faster than its interconnect can deliver that chunk (or its result; they're equivalent since our interconnects tend to have symmetric bandwidth). The resolution of the tomography is limited by the spatial resolution of the scanner in question and not simple & arbritrary interpolation. Another case of simple calculations on huge data sets is game graphics. The framerate takes a *huge* hit when the problem doesn't fit in the device's memory, because the GPU becomes limited by the slowest path to main memory.
Would this problem scale like that, or would they be limited to a smaller, less interesting set of problems?
I hope I've illustrated how this is a false dichotomy. The only experience I have in this particular area is second-hand, with brain imaging; the analysis isn't even close to real time, which is why the normal way of running the computations is use very powerful individual systems. No single problem spans the slow interconnects (gigabit ethernet), but they
Emacs used to be really big. Now it's pretty small, even with the graphics interfaces and kitchen sink written in ELisp. My browser's currently burning 300MB (the 3 Beta 5 version used much less RAM than 2.x, and I bought more RAM about the same time the RC version came out, so I don't know how much of the change is which :-) Of course, any time you take a 5-year-old application and run it on a current computer, it turns out to be small and blazingly fast, unless it has to play games with hardware emulation that can make it as slow as the original.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
If you look at most of the Top500 supercomputers these days, they're not Fluorinert-cooled Cray2s any more - they're almost all lots of processors with some kinds of interconnection network to get data between processors, and they're running lots of SIMD to make LINPAK go fast. This machine is using the CPU interconnects because they benchmarked it and found they didn't need to mess with SLI, and it may not have quite the I/O scalability that the Earth Simulator or its faster competitors have, but it's in the same kind of space. It's not near the top, but it's certainly respectable.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
...will it run Red Hat?