Nvidia 480-Core Graphics Card Approaches 2 Teraflops
An anonymous reader writes "At CES, Nvidia has announced a graphics card with 480 cores that can crank up performance to reach close to 2 teraflops. The company's GTX 295 graphics cards has two GPUs with 240 cores each that can execute graphics and other computing tasks like video processing. The card delivers 1.788 teraflops of performance, which Nvidia claims is the fastest single graphics card in the market."
its not a problem to implement 52342525113 cores. its a problem to implement in cost, size, and power drain that an acceptably priced gamer pc case can accommodate.
so far, nvidia is failing in that respect.
Read radical news here
No, seriously... can anything run it at full options yet?
Tibbon
tibbon.com
1.21 Jiggawatts
Yet again, Nvidia showed ATI that it, indeed, has the biggest penis.
That's just great and all but when can I get a video card that doesn't take up half my case and melts down after 6 months of use? Not to mention, doesn't cost an arm and a leg.
Color me doubtful but I suspect it's 480 stream processors which isn't anywhere NEAR the same thing as the "cores" on the CPU or even the core of the GPU.
Why has the press suddenly started to call stream processors "cores"? Marketing?
I'm waiting for the latest and greatest supercomputers to have huge GPU farms.
Just wait until they perfect rapid fabrication and live expansion. GPU farming is the future, fabricating additional cores on demand.
Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?
I can run Crysis/Warhead at 30fps maxed out at 720p. I have a single 4850.
The problem with video card review is they don't bother testing anything lower than 1920x1080 which is 2.25x bigger than 720.
Crysis takes a lot to run but it has already been tamed as long as you aren't running at 2560x1600 or some other absurd resolution.
218 GFlops
http://www.realworldtech.com/page.cfm?ArticleID=RWT072405191325&p=2
A single 8800 kill the cell and the video processor in the ps3 combined
One of the benefits of the technology war is that it produces good midrange and low end technology as well. This is particularly true in the case of graphics cards since they are so parallel. They more or less just lop off some of the execution units and maybe slow down the clock and you get a budget card.
Whatever your budget is, there's probably a good card available at that level. Now will it be as fast as the GTX 295? Of course not. However they'll be as fast as they can be at that price/power consumption point.
Don't pitch because some people need/want high end cards. Enjoy the fact that they help subsidize you getting good, cheap midrange cards.
If you want serious suggestions, tell me your budget range and what you want to do and I'll recommend some cards.
how much processing power does Duke Nukem Forever need! That's like supercomputer performance from 10 years ago.
Supplies!
Will this card support OpenCL?
Jumpstart the tartan drive.
According to:
http://en.wikipedia.org/wiki/Cell_(microprocessor)
The ps3 cell would be capable of 1 teraflop, IF you could keep it fed. The nvidia part is actually getting that level of throughput.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
with CPUs anymore? I'm just going to fill a case with graphics cards and call it a day.
One of our competitors trademarked the term "hypothesis". From now on, we will call them "boneheaded ideas".
No kidding! I just ran into my first Nvidia heat-o'-death situation too.
Anyone know of an after-market part to draw air directly over your PCIe cards? This is a problem that's right now solved by the turning-my-graphics-card-into-a-jet-engine solution. It works, but if there's a quieter answer that keeps the graphics power I'd be happy to hear it.
Here's the skinny:
790i comes with 3 PCIe slots so I thought to try SLI with two new cards, and an older one (in the middle thanks to the bridge) for second monitor/TV. The poor middle card just doesn't stand a chance against two 260's, it's like an oven with both elements on.
I've been using RivaTuner to adjust fan speed and watch temperature. Outer cards run ok (44 deg C), but even at Max fan speed the middle card idles at 61, and at normal speed will die if anything tries taxing the card for more than a few minutes.
--- Need web hosting?
When *won't* you be able to get a video card that takes up less than half your case and doesn't require its own power supply?
Right now you can still get a high powered graphics card for less than $50 with a small or no fan. But those cards are 2 year old technology. These days all the latest and greatest are essentially a PC within a PC and I doubt the power and cooling requirements will go down with time.
So in 5 years these rediculously large cards will cost $50 but they'll still be rediculously large.
10 years ago graphics cards and graphics requirements for games were going neck and neck. Now it seems that graphics cards are outpacing what games actually demand. So you can go with a cheaper card and still get very good quality rendering.
Work Safe Porn
Will this be the first card to run Windows Aero at a decent speed?
No. It won't be the first.
Of course, that's because the first cards to run Windows Aero at a decent speed were made several years ago.
... for Windows 7 (or whatever they call Vista now).
Which is why PC gaming will always be better than console gaming. *ducks beneath flamewar*. Seriously though, my PS3 is for BluRays and my 360 is for streaming NetFlix movies and playing Rock Band. That's it. If I want to play a real game... it's on the PC.
You are using English. Please learn the difference between loose and lose; they're, there, and their; your and you're.
Not gonna happen.
There's a lot of flops, sure, but they're arranged in a long pipeline where the only input is "texture map" and the only output is "frame buffer". That's not much good for general purpose processing.
Oh, and they're only single precision, which wipes out another big chunk of possibilities.
No sig today...
Because their Tesla boards post nearly a TFLOP of performance for single precision computing, but only 78 MFLOPS for double precision.
Didn't you know the second power connection to your GPU is actually for the oven/space heater function? So it's actually a feature!
Nvidia realized long ago that to maximize play-time they needed a way for users to cook and stay warm near their PCs.
I've made some mean eggs on my case, recipe came from the included Nvidia cook-book.
-Matt
--- Need web hosting?
Just think of what the next generation of consoles will have. Microsoft will learn from their mistake (hopefully) and allow for better heat dissipation. And there is no telling what Sony will come up with to try and secure their share in the market. Anyway... these are all hopes of course. My point being, the next gen consoles should deliver some mind blowing experiences.
"The irony when tending a flock of sheep is the dogs you put in place to protect them are genetically mutated wolves"
Can someone please post the link to a how-to guide for convincing your wife/girlfriend of the necessity of owning a graphics card with dual 240-core GPUs? Or, if you are a girl who acknowledges said necessity without a fight, please post a link to your Facebook profile. Thank you in advance.
Hmm? The GP said Beowulf cluster. Where in that did you read general purpose computing?
There are many HPC problems that you can solve adequately with single precision.
Apart from, you know, link length.
The most important thing to understand is that these aren't actually 'cores' in the same sense that your Core 2 Duo has two of them. They're shader units. It works more like SIMD than parallelization, only instead of something like SSE that can perform a single operation per clock across 4 packed floating point values it performs the operation on thousands of them.
If they could slap a billion or a million or even a thousand shader units on a card without actually reducing performance they would, but they can't. At a certain point the bottleneck becomes link length. You can overcome it by increasing voltage but then heat becomes the issue. This is a large part of the reason transistor count is tied to transistor size. NVIDIA isn't "failing" in this respect, they're just succumbing to the laws of physics.
If they could improve performance by slapping 20 or 4 or even 2 of the *actual* cores on each card they would, but they can't. Because it's not an actual processor, it doesn't have fancy features like three levels of cache and a TLB and branch prediction and out-of-order execution. But even if they were engineered to work this way, you can't improve PC performance by slapping in a thousand Core 2 Quads either. A part of the reason Xeons have so much cache is so you can mitigate the penalty of having 8 processors using commodity RAM, but eventually you run up against that bottleneck. Shared resources become saturated much faster than most people expect.
The most efficient way of improving graphics performance is with SLI because you are replicating all of the hardware, the memory and the bus the *actual* core depends on. For the exact same reason, you can extract the most performance out of each CPU core by putting each one in a different machine.
I'm an old gamer starting off both an Atari 2600 and a 286 with CGA graphics. I've played just about everything in between then and now. Currently, I enjoy gaming on my PS3 and PC loaded with 8GB RAM, Quad Core with nVidia 8800GT card.
The whole PC vs. Console war is just dumb. Anyone that relates to me will tell you that a PC has the potential to be the best platform, but the games are coded to be open ended (review the plethora of video, graphics, input and audio option settings to choose from) to capture the largest market share. A console will run with inferior spec when compared to a high-end PC, but it has been tuned and optimized just for that platform. Without question, the moment you play a console game, it will run as expected and designed for.
So, which platform *is* better? Depends on a lot of things. Will that latest game run on your PC to your own satisfaction? Or do you prefer games where it will run flawless on both your console and everyone else's; thus leveling the playing field?
Life is not for the lazy.
Compare this to the Radeon 4870 X2 : 2 55nm RV770 GPUs on the same PCB connected by a PCIe bridge although the card has a "Crossfire X Sideport" interlink ( which I think is Hypertransport, although I may be wrong ) that directly connects the two GPUs, which isn't enabled in their drivers at the moment. (you can see it on the PCB -- a set of horizontal traces directly linking both GPUs ) One might wonder if they've delayed enabling the direct link because they knew Nvidia would respond this way.
Anyway, it's always great when two companies battle it out, as the consumer always wins.
jdb2
Imagine a beowulf cluster of these!
Disclaimer: The opinions and actions of the US Gov't are in no way representative of those held by this author or its ci
Not even close. A single ATI HD4870X2 card has 2.4 TFLOPS or processing power: 2 (instr/clock with MAD) * 800 (Streaming Processors) * 750 (MHz) * 2 (GPUs) = 2.4 TFLOPS.
My computer cost under $800 and runs it just fine with all settings on high.
The people who pay $4000 for a 'rig' do it as a hobby. Some people build fast cars, some people build model airplanes, some people build shiny computers as a hobby, and there are even some people that piss all over other people's hobbies as a hobby.
You're saying Crysis is incredibly unimpressive, compared to what? I would like to know what is so much more impressive as to make Crysis look incredible unimpressive.
I thought the Radeon HD 4870 x2 did 2400 GFLOPs, i.e. 2.4 TFLOPs?
I'm not saying the Radeon is more powerful, but in terms of FLOPs, it is (I thought), faster.
But can it run Vista?
You are now manually breathing.
Is Beowulf only for some special subset of problems?
Where did I say "you can't solve problems if you only have single precision"?
Supercomputers are expensive because of the exotic memory architectures needed to let all those CPUs have random access to the dataset and so they can communicate with each other.
If a problem can be broken down into little discrete chunks you don't need a supercomputer, you need a cluster of cheap computers.
GPUs are the next step down from that - very good if your problem fits their architecture, but only a subset of real problems will.
No sig today...
Horse shit.
List components and when they were purchased.
For the purpose of playing Crysis, a "computer" consists of:
Case
Power Supply
Motherboard
CPU
GPU
RAM
Hard Drive
Optical Drive
Monitor
OS License
I'll let you scrounge for your keyboard, mouse, and speakers/headphones. You can use onboard sound, too.
http://hothardware.com/Articles/NVIDIA-GeForce-GTX-295-Unleashed/
Until NVIDIA starts supporting the development of open source drivers I'm sticking with ATI, no matter how many Blazing Cores of Might NVIDIA might fit onto their chips. While ATI's closed source drivers have their fair share of bugs, and it will be some time before there are good 3D open source drivers for their more recent cards, at least the development has started and ATI has been aiding it, not hobbling it.
The Nvidia Tesla offers IEEE 754 double precision floating point. It's essentially a headless graphics card programmed with CUDA. My guess is that the latest GeForce and Quadra do as well, but I'm too lazy to dig around. Today's shaders are a lot more sophisticated than the ones on the GeForce 3.
I'll actually bite, just to show you how reasonable the hardware requirements are for Crysis. I will only price out the tower, no monitor, so you can compare this gaming PC to a console. Consoles don't come with a TV, and the display has no impact on performance.
Case: Antec Sonata - $100
Power Supply: Antec 500w PSU - comes with the case
Motherboard: Asus P5KPL-CM - $55
CPU: Intel Q6600 2.4ghz quad - $190
GPU: Geforce 9800GT 512mb - $125
RAM: 4gb DDR2-800 - $50
Hard Drive: Seagate 500gb SATA - $60
Optical Drive: Any DVD-RW - $20
OS License: XP or Vista OEM - $100
So we're sitting at $700 even, so roughly $800 tax in. If you're not comfortable assembling it yourself, most shops will do it for an extra $40.
Such a PC, not at all bleeding edge nor prohibitively expensive, will run Crysis on High Detail, in 1280x1024 at 30fps. It will also run in widescreen 1920x1200, but you'll drop to 15-20 fps, which is a bit low.
-Billco, Fnarg.com
Is it really MFlop or GFlops? 78 GFlops may seem a tad high to me, but 78 MFlops is much lower than I would expect. A Core2 based quad core 3.0 ghz would get 48 GFlops theoretical peak double precision gflops...
XML is like violence. If it doesn't solve the problem, use more.
See http://www.pcworld.com/businesscenter/article/155242/inside_tsubame_the_nvidia_gpu_supercomputer.html, for example.
"I'm waiting for the latest and greatest supercomputers to have huge GPU farms."
This thing just now is hitting 2 TFLOPS where the PS3's theoretical performance is 2 TFLOPS.
Folding @ home? Granted, this thing only has 8 cores and a custom GPU but hey, that's still putting it on par with the new nVidia card alone, let's not mention price, which is roughly equivalent.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
ATI HD4870 X3 anyone?
I'd consider that more likely than Nvidia managing 3 of their giant GPU's on a single card.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
I'll do one better.
Case = bullshit $20 wonderjob at a pawn shop.
PSU - 700w Rocketfish for 70 bucks.
mobo/CPU combo - PC Chips with dual-core AMD Athlon64 X2 5200+ - 60 bucks
RAM - 4GB cheapo RAM - 20 bucks from craigslist.
GPU - 512MB 9800GTX+ - 175 from pricewatch.
Hard Drive - 80GB 7200RPM WD - FREE from craigslist, complete with porn!
Optical drive - DAEMON TOOLS, but I've found the one in my machine for 10 bucks
OS License - XP Pro - 100.
455 bucks, Crysis at 1920x1080 at high settings. I get very few framerate issues, in fact I only got them during the battleship invasion part of the game.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
What you've done it's called style. Style = do more with less. I'd mood you insightful if I could.
I mean seriously, as long as they don't publish the hardware specifications so you can write your own software for it, it's preety much useless. The only thing you can do with it is play games. And even then you have to fear every little software update as it might trigger some bug in the binary only drivers the manufacturer provides.
Learning how to put these CUDA cores to work for more than games is a great new opportunity because each new NVIDIA card has more of these resources. Unfortunately this seems to be rocket science and just because engineers can build these boards doesn't mean that the software community is ready or able to design software that benefits from this architecture. When they do, things will get very interesting. Hardware people decided to go multicore because it was getting harder to go faster with uni-core processors. Software people got told they would have to write a different kind of software to stay competitive, and this area will be very important in the future. Actually it is right now. I noticed Dell is pushing 2.5 GHz quad core machines with six gigs of memory at Costco. I don't know how much of the contemporary software can properly utilize these cores, but time will tell. As the programming languages get built-in support for multi-core programming, things will improve. I noticed there is some nice support in Python.
The most efficient way of improving graphics performance is with SLI because you are replicating all of the hardware, the memory and the bus the *actual* core depends on. For the exact same reason, you can extract the most performance out of each CPU core by putting each one in a different machine.
With the amazing reduction in the price of LCD monitors, I wonder if we won't see a resurgence in multiscreen gaming. Then you could use multiple cards without SLI and see performance gains, just as the increase in multiprocessing has made it possible to use multicore processors (or just multiple processors) to see performance gains instead of having to always increase clock rate.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
To me the bummer of console gaming, and the one thing that keeps me playing the occasional game on my PC, is the game controller situation. When status quo in console gaming is to let me use an arbitary controller (USB or Bluetooth HID anyone?) and remap game functions to arbitrary controls, then I will be able to say goodbye to PC gaming, and never look back. I'm more than willing to build a cluster of game consoles if game developers decide to head in that direction, too. But I really wish that console games would stop pissing on my hands :(
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Yes, Nvidia cards are really behind in teraflops to the PS3 card. Oh wait, it was NVIDIA who built the RSX (Ps3 graphics card). Go figure.
And those cores are from the Cell processor, not the graphics card, and are 7, not 8. The original design of the Cell uses 8, but Sony uses the "spares" with one broken core to reduce spendings.
Dilbert RSS feed
A part of the software design process is how to break up the main application into the different components. With multi-threading, the design needs to figure out what can be handled in a different thread, and if having a different thread for that function is worth the code administration needed to tie things all together.
Remember, it is fairly easy to make a different thread and have it do what you want it to do. The difficulty is in how to tie the different threads together to make the application work as expected.
A program with multiple threads never executes exactly the same way twice as the threads do not execute synchronously. Real-time conditions and resources affect when and how these threads start, what resources they have, and which run faster. Synchronization and communication of and between the threads adds to the complexity.
This is as opposed to a single threaded program that runs more or less exactly the same from one execution to the next starting at the beginning and running until the end. What can change is the amount of resources and the CPU load during execution, but you can single step a single threaded program multiple times and see the same path each time. With multi-threaded, it is like a pinball machine with multiple balls at the same time. They go different ways and sometimes they bounce off each other.
The hardware manufacturers dumped this multi-core technology on us without laying a foundation of software engineering support. I would like to see some standards and language support beyond the normal thread packages inherited from Unix. Intel's threaded toolbox is one candidate. Maybe NIST should have a contest, like they do with encryption methods. I just don't think it is efficient for us to have to roll our own multi-core support at the application level, if that means rolling our own debugging support at the same time. The more cores, the harder this is. It looks to me like today, Windows uses 1.x cores and what is left over gets used by whatever applications are trying to run. Since Microsoft always uses up the majority of the resources, memory and cycles after each advance in technology, maybe we should have them use one core for system threads, and devote at least one other for the applicationm threads. I know this is not exactly simple because of the way code slips back and for between kernel and user space as operating systems evolve, but you know what I mean. I am looking at developing an application that utilizes multiple cores (on Apple Mac Pro) and I have questions like how do you keep the OS hands off the cores you want to delegate to the application. I have about six threads I want assigned to their own cores exclusively. Core ownership is not handled in a straight forward manner. We have a ways to go before these things become clear.
I am looking at developing an application that utilizes multiple cores (on Apple Mac Pro) and I have questions like how do you keep the OS hands off the cores you want to delegate to the application.
What exactly are you trying to do? Are you just trying to get as much CPU time as possible? If you're writing it for OSX you could try NSOperation and NSOperationQueue. Or you could grab a C++ compiler with support for OpenMP and do a lot of loop-level parallelization. That would be the easiest route for most uses. Or if you really actually want dedicated access to a processor you could always write it for Windows 3.1 and just never call GetMessage.
You cannot guarantee exclusive access to a processor. Modern schedulers are specifically designed to prevent applications from doing this: all you can do is ask the operating system to return to your thread more often.
Programmers don't like to remember this, but our code probably isn't the most important code running on a machine. And even if it really, truly is, the OS has absolutely no way of telling the difference between you and some third-rate tray application that would take over an entire CPU just because it can.
And having multithreading "dumped" on us isn't as huge a deal as you make it. It's perfectly possible to develop multithreaded code using existing C interfaces and the various OS-specific synchronization primitives, and virtually every nontrivial application has done this ever since Windows got a preemptive scheduler. Read/write reordering is much more inconvenient.
You apparently miss the point of what I'm saying, so I'm going to break it down for you.
It took nvidia 480 cores to get performance of what 8 cores could do. The GPU isn't counted in the PS3 as part of the theoretical performance of the PS3.
I'll bet this card also eats more power compared to a full-blown PS3. That's TOO FUCKING MUCH SILICON.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
"are 7, not 8. The original design of the Cell uses 8, but Sony uses the "spares" with one broken core to reduce spendings."
WRONG. The 8th core is RESERVED FOR REDUNDANCY, meaning in case one of the others fails it's there to take over. Not one fucking thing is broken.
You didn't even bother to read the specs directly from Sony's website, did ya? Here, I'll quote it since you don't know.
"* 1 of 8 SPEs reserved for redundancy total floating point performance: 218 GFLOPS"
Now, then. Back to WRITING CODE ON THE PS3. Later.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
Which is why PC gaming will always be better than console gaming.
And board games will always be better than either PC or console gaming, at least for multi-player gaming. =)
/.Mattsson - My native language is not English, so please don't whine over linguistic errors. (That's lame anyway...)
I wonder when the aircrack-ng team comes up with a solution to use this power to bruteforce wpa keys. Elcomsoft already does this: http://mobile.slashdot.org/article.pl?sid=08/10/12/1724230 The already claimed a 100x speed increase, so i wonder what can be done with this new card.
So you've built a PC that will fail quickly because of the shitty shitty ram, psu, and motherboard.
You left out the cost of the hard drive.
You left out the monitor, which is 1920 x 1080.
I know for a fact you'll have frame rate issues (60 fps) with that hardware. If you're one of those people who think 30 fps is fine, then all I can do is lol.
And if you're using current prices, let's keep in mind that Crysis is now 14 months old.
30 fps is not acceptable.
60 fps or higher.
I can run crysis on a piece of shit if you want, and put it at the highest settings the graphics card supports, but if it runs like shit, you're missing the point.
Add in monitor cost, retard. It's part of the computer. No one said anything about consoles, TVs, electricity, houses, clothes, food, etc.
And keep in mind, Crysis came out 14 months ago!