Nvidia 480-Core Graphics Card Approaches 2 Teraflops
An anonymous reader writes "At CES, Nvidia has announced a graphics card with 480 cores that can crank up performance to reach close to 2 teraflops. The company's GTX 295 graphics cards has two GPUs with 240 cores each that can execute graphics and other computing tasks like video processing. The card delivers 1.788 teraflops of performance, which Nvidia claims is the fastest single graphics card in the market."
its not a problem to implement 52342525113 cores. its a problem to implement in cost, size, and power drain that an acceptably priced gamer pc case can accommodate.
so far, nvidia is failing in that respect.
Read radical news here
...a Beowulf cluster of these! In all seriousness, I'm waiting for the latest and greatest supercomputers to have huge GPU farms.
SSC
No, seriously... can anything run it at full options yet?
Tibbon
tibbon.com
No, wait. Better to not try then there's no fear of failing. Right!
1.21 Jiggawatts
Yet again, Nvidia showed ATI that it, indeed, has the biggest penis.
That's just great and all but when can I get a video card that doesn't take up half my case and melts down after 6 months of use? Not to mention, doesn't cost an arm and a leg.
Color me doubtful but I suspect it's 480 stream processors which isn't anywhere NEAR the same thing as the "cores" on the CPU or even the core of the GPU.
Why has the press suddenly started to call stream processors "cores"? Marketing?
Doesn't the PS3's Cell processor do more than this, and it came out years ago?
I can run Crysis/Warhead at 30fps maxed out at 720p. I have a single 4850.
The problem with video card review is they don't bother testing anything lower than 1920x1080 which is 2.25x bigger than 720.
Crysis takes a lot to run but it has already been tamed as long as you aren't running at 2560x1600 or some other absurd resolution.
One of the benefits of the technology war is that it produces good midrange and low end technology as well. This is particularly true in the case of graphics cards since they are so parallel. They more or less just lop off some of the execution units and maybe slow down the clock and you get a budget card.
Whatever your budget is, there's probably a good card available at that level. Now will it be as fast as the GTX 295? Of course not. However they'll be as fast as they can be at that price/power consumption point.
Don't pitch because some people need/want high end cards. Enjoy the fact that they help subsidize you getting good, cheap midrange cards.
If you want serious suggestions, tell me your budget range and what you want to do and I'll recommend some cards.
how much processing power does Duke Nukem Forever need! That's like supercomputer performance from 10 years ago.
Supplies!
Will this card support OpenCL?
Jumpstart the tartan drive.
Will this be the first card to run Windows Aero at a decent speed?
with CPUs anymore? I'm just going to fill a case with graphics cards and call it a day.
One of our competitors trademarked the term "hypothesis". From now on, we will call them "boneheaded ideas".
Crytek has done a fantastic job creating Id/John Carmack type fanboyism for their game.
With the sorry state PC games is in with more and more developer leaving or focusing on the console market, Crysis has been latched onto as some sort of holy messiah that is the one thing that still justifies their 3-4000 dollar 'rig'.
Lots of silly tech demos that have nothing to do with the actual in game graphics and carefully staged Crysis vs real life comparisons have created a fanatical fanboy worship for the game/engine/company.
Too bad the actual in game graphics are incredibly unimpressive. Even more so when the screen isn't completely covered in foliage.
No kidding! I just ran into my first Nvidia heat-o'-death situation too.
Anyone know of an after-market part to draw air directly over your PCIe cards? This is a problem that's right now solved by the turning-my-graphics-card-into-a-jet-engine solution. It works, but if there's a quieter answer that keeps the graphics power I'd be happy to hear it.
Here's the skinny:
790i comes with 3 PCIe slots so I thought to try SLI with two new cards, and an older one (in the middle thanks to the bridge) for second monitor/TV. The poor middle card just doesn't stand a chance against two 260's, it's like an oven with both elements on.
I've been using RivaTuner to adjust fan speed and watch temperature. Outer cards run ok (44 deg C), but even at Max fan speed the middle card idles at 61, and at normal speed will die if anything tries taxing the card for more than a few minutes.
--- Need web hosting?
When *won't* you be able to get a video card that takes up less than half your case and doesn't require its own power supply?
Right now you can still get a high powered graphics card for less than $50 with a small or no fan. But those cards are 2 year old technology. These days all the latest and greatest are essentially a PC within a PC and I doubt the power and cooling requirements will go down with time.
So in 5 years these rediculously large cards will cost $50 but they'll still be rediculously large.
10 years ago graphics cards and graphics requirements for games were going neck and neck. Now it seems that graphics cards are outpacing what games actually demand. So you can go with a cheaper card and still get very good quality rendering.
Work Safe Porn
... for Windows 7 (or whatever they call Vista now).
Crysis was a brilliant marketing success built on top of very average engineering.
A whole lot of technical buzzwords and silly tech demos. But the game completely falls on its face in actual in game graphics outside of anywhere that isn't entirely covered in jungle. The most inane were the 'amazing' real life comparisons where they would take a lot of high rez scanned textures and carefully find the perfect spot to position the camera. Cute, but technically a complete yawn.
Not to mention the game itself was mind numbingly dull. Even by the fairly low fps genre standards.
Because their Tesla boards post nearly a TFLOP of performance for single precision computing, but only 78 MFLOPS for double precision.
Didn't you know the second power connection to your GPU is actually for the oven/space heater function? So it's actually a feature!
Nvidia realized long ago that to maximize play-time they needed a way for users to cook and stay warm near their PCs.
I've made some mean eggs on my case, recipe came from the included Nvidia cook-book.
-Matt
--- Need web hosting?
Just think of what the next generation of consoles will have. Microsoft will learn from their mistake (hopefully) and allow for better heat dissipation. And there is no telling what Sony will come up with to try and secure their share in the market. Anyway... these are all hopes of course. My point being, the next gen consoles should deliver some mind blowing experiences.
"The irony when tending a flock of sheep is the dogs you put in place to protect them are genetically mutated wolves"
Can someone please post the link to a how-to guide for convincing your wife/girlfriend of the necessity of owning a graphics card with dual 240-core GPUs? Or, if you are a girl who acknowledges said necessity without a fight, please post a link to your Facebook profile. Thank you in advance.
Apart from, you know, link length.
The most important thing to understand is that these aren't actually 'cores' in the same sense that your Core 2 Duo has two of them. They're shader units. It works more like SIMD than parallelization, only instead of something like SSE that can perform a single operation per clock across 4 packed floating point values it performs the operation on thousands of them.
If they could slap a billion or a million or even a thousand shader units on a card without actually reducing performance they would, but they can't. At a certain point the bottleneck becomes link length. You can overcome it by increasing voltage but then heat becomes the issue. This is a large part of the reason transistor count is tied to transistor size. NVIDIA isn't "failing" in this respect, they're just succumbing to the laws of physics.
If they could improve performance by slapping 20 or 4 or even 2 of the *actual* cores on each card they would, but they can't. Because it's not an actual processor, it doesn't have fancy features like three levels of cache and a TLB and branch prediction and out-of-order execution. But even if they were engineered to work this way, you can't improve PC performance by slapping in a thousand Core 2 Quads either. A part of the reason Xeons have so much cache is so you can mitigate the penalty of having 8 processors using commodity RAM, but eventually you run up against that bottleneck. Shared resources become saturated much faster than most people expect.
The most efficient way of improving graphics performance is with SLI because you are replicating all of the hardware, the memory and the bus the *actual* core depends on. For the exact same reason, you can extract the most performance out of each CPU core by putting each one in a different machine.
Compare this to the Radeon 4870 X2 : 2 55nm RV770 GPUs on the same PCB connected by a PCIe bridge although the card has a "Crossfire X Sideport" interlink ( which I think is Hypertransport, although I may be wrong ) that directly connects the two GPUs, which isn't enabled in their drivers at the moment. (you can see it on the PCB -- a set of horizontal traces directly linking both GPUs ) One might wonder if they've delayed enabling the direct link because they knew Nvidia would respond this way.
Anyway, it's always great when two companies battle it out, as the consumer always wins.
jdb2
Imagine a beowulf cluster of these!
Disclaimer: The opinions and actions of the US Gov't are in no way representative of those held by this author or its ci
Not even close. A single ATI HD4870X2 card has 2.4 TFLOPS or processing power: 2 (instr/clock with MAD) * 800 (Streaming Processors) * 750 (MHz) * 2 (GPUs) = 2.4 TFLOPS.
I thought the Radeon HD 4870 x2 did 2400 GFLOPs, i.e. 2.4 TFLOPs?
I'm not saying the Radeon is more powerful, but in terms of FLOPs, it is (I thought), faster.
How hot is this thing going to get?
The 4500 has a monster heatsink/fan combo and it has no where near that kind of power.
This is graphics turned up to 11
Every time you call tech support, a little kitten dies.
But can it run Vista?
You are now manually breathing.
http://hothardware.com/Articles/NVIDIA-GeForce-GTX-295-Unleashed/
Until NVIDIA starts supporting the development of open source drivers I'm sticking with ATI, no matter how many Blazing Cores of Might NVIDIA might fit onto their chips. While ATI's closed source drivers have their fair share of bugs, and it will be some time before there are good 3D open source drivers for their more recent cards, at least the development has started and ATI has been aiding it, not hobbling it.
Is it really MFlop or GFlops? 78 GFlops may seem a tad high to me, but 78 MFlops is much lower than I would expect. A Core2 based quad core 3.0 ghz would get 48 GFlops theoretical peak double precision gflops...
XML is like violence. If it doesn't solve the problem, use more.
ATI HD4870 X3 anyone?
I'd consider that more likely than Nvidia managing 3 of their giant GPU's on a single card.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
This is the internet. You can say "penis" here.
I mean seriously, as long as they don't publish the hardware specifications so you can write your own software for it, it's preety much useless. The only thing you can do with it is play games. And even then you have to fear every little software update as it might trigger some bug in the binary only drivers the manufacturer provides.
Learning how to put these CUDA cores to work for more than games is a great new opportunity because each new NVIDIA card has more of these resources. Unfortunately this seems to be rocket science and just because engineers can build these boards doesn't mean that the software community is ready or able to design software that benefits from this architecture. When they do, things will get very interesting. Hardware people decided to go multicore because it was getting harder to go faster with uni-core processors. Software people got told they would have to write a different kind of software to stay competitive, and this area will be very important in the future. Actually it is right now. I noticed Dell is pushing 2.5 GHz quad core machines with six gigs of memory at Costco. I don't know how much of the contemporary software can properly utilize these cores, but time will tell. As the programming languages get built-in support for multi-core programming, things will improve. I noticed there is some nice support in Python.
So does it make coffee also?
The most efficient way of improving graphics performance is with SLI because you are replicating all of the hardware, the memory and the bus the *actual* core depends on. For the exact same reason, you can extract the most performance out of each CPU core by putting each one in a different machine.
With the amazing reduction in the price of LCD monitors, I wonder if we won't see a resurgence in multiscreen gaming. Then you could use multiple cards without SLI and see performance gains, just as the increase in multiprocessing has made it possible to use multicore processors (or just multiple processors) to see performance gains instead of having to always increase clock rate.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
A part of the software design process is how to break up the main application into the different components. With multi-threading, the design needs to figure out what can be handled in a different thread, and if having a different thread for that function is worth the code administration needed to tie things all together.
Remember, it is fairly easy to make a different thread and have it do what you want it to do. The difficulty is in how to tie the different threads together to make the application work as expected.
A program with multiple threads never executes exactly the same way twice as the threads do not execute synchronously. Real-time conditions and resources affect when and how these threads start, what resources they have, and which run faster. Synchronization and communication of and between the threads adds to the complexity.
This is as opposed to a single threaded program that runs more or less exactly the same from one execution to the next starting at the beginning and running until the end. What can change is the amount of resources and the CPU load during execution, but you can single step a single threaded program multiple times and see the same path each time. With multi-threaded, it is like a pinball machine with multiple balls at the same time. They go different ways and sometimes they bounce off each other.
The hardware manufacturers dumped this multi-core technology on us without laying a foundation of software engineering support. I would like to see some standards and language support beyond the normal thread packages inherited from Unix. Intel's threaded toolbox is one candidate. Maybe NIST should have a contest, like they do with encryption methods. I just don't think it is efficient for us to have to roll our own multi-core support at the application level, if that means rolling our own debugging support at the same time. The more cores, the harder this is. It looks to me like today, Windows uses 1.x cores and what is left over gets used by whatever applications are trying to run. Since Microsoft always uses up the majority of the resources, memory and cycles after each advance in technology, maybe we should have them use one core for system threads, and devote at least one other for the applicationm threads. I know this is not exactly simple because of the way code slips back and for between kernel and user space as operating systems evolve, but you know what I mean. I am looking at developing an application that utilizes multiple cores (on Apple Mac Pro) and I have questions like how do you keep the OS hands off the cores you want to delegate to the application. I have about six threads I want assigned to their own cores exclusively. Core ownership is not handled in a straight forward manner. We have a ways to go before these things become clear.
I am looking at developing an application that utilizes multiple cores (on Apple Mac Pro) and I have questions like how do you keep the OS hands off the cores you want to delegate to the application.
What exactly are you trying to do? Are you just trying to get as much CPU time as possible? If you're writing it for OSX you could try NSOperation and NSOperationQueue. Or you could grab a C++ compiler with support for OpenMP and do a lot of loop-level parallelization. That would be the easiest route for most uses. Or if you really actually want dedicated access to a processor you could always write it for Windows 3.1 and just never call GetMessage.
You cannot guarantee exclusive access to a processor. Modern schedulers are specifically designed to prevent applications from doing this: all you can do is ask the operating system to return to your thread more often.
Programmers don't like to remember this, but our code probably isn't the most important code running on a machine. And even if it really, truly is, the OS has absolutely no way of telling the difference between you and some third-rate tray application that would take over an entire CPU just because it can.
And having multithreading "dumped" on us isn't as huge a deal as you make it. It's perfectly possible to develop multithreaded code using existing C interfaces and the various OS-specific synchronization primitives, and virtually every nontrivial application has done this ever since Windows got a preemptive scheduler. Read/write reordering is much more inconvenient.
I wonder when the aircrack-ng team comes up with a solution to use this power to bruteforce wpa keys. Elcomsoft already does this: http://mobile.slashdot.org/article.pl?sid=08/10/12/1724230 The already claimed a 100x speed increase, so i wonder what can be done with this new card.