AMD RV790 Architecture To Change GPGPU Landscape?
Vigile writes "To many observers, the success of the GPGPU landscape has really been pushed by NVIDIA and its line of Tesla and Quadro GPUs. While ATI was the first to offer support for consumer applications like Folding@Home, NVIDIA has since taken command of the market with its CUDA architecture and programs like Badaboom and others for the HPC world. PC Perspective has speculation that points to ATI addressing the shortcomings of its lineup with a revised GPU known as RV790 that would both dramatically increase gaming performance as well as more than triple the compute power on double precision floating point operations — one of the keys to HPC acceptance."
I hope all these new things will be compatible with OpenCL.
... the "rename the same old shit four times to try and con people"-market, that's for sure.
Waiting for GPGPGPUs
What in the screaming blue hell is a GPGPU?
So this is what some anonymous guy on the internet thinks might happen? Granted, he has a lot of material in there, but in the end it's all just guesswork. Apparently he's a big fan of cheaper lower end video cards as well, and is hoping that ATI releases one.
I read the internet for the articles.
General Purpose GPU's = massively parallel flops operations possible. ( Think matrix math, real time sims, lab testing, SETI, etc).
Still separate from a CPU, which has additional capabilities.
For the older folks, think of this as a math co-processor :) [ with it's own fan]
...because since I learned that BOINC now supports CUDA (but still has no love for GPGPU), I'm about to ditch my ATI cards for a few Nvidia ones.
It is by my will alone my thoughts acquire motion; it is by the juice of the coffee bean that the thoughts acquire speed
As far as I know, the RV790 will be in the R600/R700 family and will work almost perfectly with existing R600/R700 code. While I have no guarantees on this, current talks with AMD employees haven't given off any indication that this chipset will be radically different from its cousins.
~ C.
Meanwhile, isn't this just yet another area that AMD/ATI is playing catchup? Not a role I'd like to be in against Intel and NVidia.
ATI or AMD?
AMD's double-point floating point performance is already great. What they lack is the rest of it. The programming model is pretty bad compared to CUDA (nobody is using Brook+), and they seem to be basically waiting for OpenCL to fix that. The bottlenecks in most attempts to use AMD chips for GPGPU code are also not really the floating-point units themselves, but the rest of the architecture; it's hard to keep the ALUs fed with your data without a magic compiler, a better programming model, a better architecture, or some combination of those.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Well, waiting for cheapie GPGPUs, anyway.
What I want from the GPU is features like what the CPUs have so that the GPU can have multiple VMs running in it. The only reason that I don't run inside of a VM as my primary computing environment is because graphics acceleration pretty much suck in it. When AMD bought ATI I expected virtualized video to be one of their early announcement.
Imagine if your VMed OS could believe that it had 100% control of the video card, but your video card would display on it's own 'surface', and still use full hardware acceleration for the process. As far as I can tell, video is the only serious stumbling block left in virtualizing the x86 architecture.
Some guy who does not know very much posts a long speculation article, all speculation done with his limited undertanding. And then this is posted as news.
RV790 is just higher-clocked RV770. There are no more shader units. There are no shader units converted to 64-bit. it's just ~10% clock speed increase, giving about 10% more performance.
RV800 will come at end of the year, that will have much power.
And, of course, like with most people who do a "My favored company will come out with the bestest thing EVAR!" he's ignoring the fact that nVidia won't sit still. I don't know what's coming next from nVidia. What I do know is they currently have a powerful card for gaming and GPGPU (GTX285) that does support double precision as well as single precision, though DP is much slower. So, fairly safe to say their next generation card will also support DP, and will probably be faster than their current card.
To me, this just seems like fanboy rambling. Yes I'm sure ATi's next card will be better than their current cards. What of it? Unless you've got specifics AND specifics of what their competitor is doing, you can't really say how it'll change things. I mean even if you found out that ATi was making a card that was 10x as fast as their current one, that wouldn't mean anything unless you also knew nVidia wasn't.
We'll know what happens..... When it happens.
Actually the AMD Firestream is far superior to the nVidia, for several reasons including true double precision, and generally better performance.
Further the power consumption on the Firestream cards is far lower than the nVidia cards.
However, as usual, shoddy AMD marketing have left their offerings out in the "What is this" cloud.
Until my CAD programs use DirectX, I won't call it 'standard', sure most games on Windows use DirectX, but that doesn't mean OpenGL pointless.
Just like the parent says: the actual article is a work of fiction and speculation with no hard facts on future products.... merely "what if's".
His predictions about double precision appear to be based on a misunderstanding about how the 4800 series works. Here's what he says about it: "That 680 GFLOPs would be assuming AMD converts 2/5 of the stream units to double precision. Now, if AMD were to convert 3/5 of those units to double precision, a single card could do slightly over 1 TFLOP." He seems to believe that 1/5 of the stream units support double precision, and they could simply convert some additional ones to support it as well. But that isn't the case. In fact, it has no double precision units. Instead, it can have four single precision units work together to act as one double precision unit. That's how they were able to support double precision without using a lot of extra silicon. Actually having dedicated double precision units would require far more silicon, and would be a major change.
"I'm too busy to research this and form an educated opinion, but I do have time to tell everyone my uninformed opinion."
I would rather have quality Open Source drivers. Yeah, you through the specs "over the wall", but it would be nice if you were a bit more active. Like giving us an actual Open Source driver. Or patches. Or something. We shouldn't be doing your work for you.
Don't blame me, I didn't vote for either of them!
Let's cut out all the hype: AMD is working on a new GPU. It is expected to be faster than the current one, and they would very much like us to believe it will be better than the competition.
Can I run Perl scripts on it ? Can GPGPU routines safely coexist with games ? Can they please just agree on a common interface so I don't get locked in to the GPU-of-the-month and these companies' tendencies to break backward compatibility on an annual basis ?
The day I can use GPGPU like any other processor, is the day I start giving a damn about GPGPU. For now, just shut the fuck up and give me a video card that doesn't suck.
The GTX260 has 32 dedicated double precision processors
No version of the GTX 260 has this number of dedicated double-precision units. In both the 192- and 216-unit versions, 1 of the 8 units in each core can do one double-precision MADD per cycle.
So:
GTX 260: 192/8 = 24 cores, thus 24 double-precision units
GTX 260 216SP: 216/8 = 27 cores, thus 27 double-precision units
Even the GTX 280 has only 30 such units:
240/8 = 30
The GTX260 also comes with more streaming(single precision) processors.
The terminology for both the G8x/G9x series and the GT200/D10U series is to call all the units "streaming processors" without distinction between single- and double-precision ability; perhaps you're just clarifying that the GTX 260 has more units [as in, is more capable] even if we only consider single-precision units.
My rough understanding is that those double precision processors are roughly equal to 1.5x a Q6600(quad core), or 6 cores.
1) It depends heavily on the job (compute- or memory-bound? I'm not familiar with Pyrit, but table generation for cipher attacks sounds memory-bound)
2) Those particular numbers are off for both single- and double-precision.
From the above numbers, the maximum theoretical double-precision rate for GTX 260-216SP is ~67 GFLOPS, and stock memory is 896MB @ ~112GB/s. Real-world is about half that because you seldom get both instructions per cycle and only reliably get 1. Single-precision is 3 instructions/clock theoretical but 2 for practical, and 8 times the units, so 67*(3instr/2instr. = 1.5)*8 ~= 805 GFLOPS.
The Q6600 Kentsfield is 2400MHz/1066MHz.; we'll be generous for the sake of memory bandwidth and talk about the Xeon E5420 at 2500MHz/1333MHz. Each core can theoretically perform 4 FLOPS whether single- or double-precision, and we have 4 of them, running at the known frequency, so theoretical max is 80 GFLOPS, and not quite so hard to exploit because it and its compilers were engineered to give good performance on poorly-behaved data, unlike GPUs (and their hamstrung, underdeveloped compilers).
Since we're using Socket-F we can saturate the FSB: (2*64-bit channels) * (2*667MHz interleaved per channel) ~= 21.3GB/s theoretical, which is rather optimistic but of efficiency comparable to a graphics card's device memory's.
If your task can be done with single-precision, is memory-bound and fits in about 1GB per job, the GTX 260 compares favorably to a Xeon: 800 (500, say) vs 80 (60, say) GFLOPS, and 112 (85, say) vs 21 (15, say) GB/s to memory. For double-precision it drops to 67 (33, say) GFLOPS so there's no clear-cut speed advantage and you should either use AMD's current offerings or stick to using standard CPU (or Clearspeed or a proper supercomputer if you're made of money).
If your dataset doesn't fit AMD's or Nvidia's GPU device memory (1~3GB), you pretty much have to go with a CPU or supercomputer since you can bolt 64GB or 128GB to your dual quad-core Xeon system, and your code will be vastly more portable for free.
Hi dear my all site iPhone,iPod,Mac,Apple any other hardware repair related sites i am sure my all sites your problum solve
Plz use Google .com & Google .co.in - Good Ranking my all sites
http://www.techrestore.com/
http://www.macbook-repair.com/macbook-repair.htm
http://www.macbook-pro-repair.com/macbook-pro-repair.htm
http://www.apple-macbook-pro-repair.com/apple-macboock-pro-repair.htm
http://www.apple-ipod-repair.com
http://www.macbook-screen-repair.com/macbook-screen-repair.htm
http://www.apple-macbook-screen-repair.com/macbook-screen-repair.htm
http://www.apple-lcd-screen-repair.com/apple-lcd-screen-repair/
http://www.ibook-repair.com/ibook-repair/
http://www.macbookscreenrepair.com
http://www.powerbookscreenrepair.com/powerbook-screen-repair/
http://www.powerbook-repair.com/powerbook-repair/
http://www.iphonerepairhelp.com/iPhone-repair.htm
http://www.psprepairhelp.com/psp-repair.htm
http://www.ipod-repair-help.com/iPod-repair.htm
http://www.iphone-repair-help.com/iphone-repair.htm
http://www.ps3driverepair.com/ps3-drive-repair.htm
http://www.psp-repair-help.com/psp-repair.htm
http://www.brickedpsp.com/bricked-psp.htm
http://www.partsforpsp.com/parts-for-psp.htm
--------------------
Other sites
http://crystaltechesolutions.wordpress.com/
http://preciousgemstonebeads.wordpress.com/
http://www.shriramgems.com
http://unlimited-moviez.com/webdir/
Call-Us
1-888-64-RESTORE 1-888-647-3786
Contact By riya1984@gmail.com
No idea why I said 2 units * (1 MUL + 1 ADD) * 4 cores * 2500MHz = 80GFLOPS (!?) for the Kentsfield/Harpertown; obviously it's 40GFLOPs, and obviously the analysis holds.
Some more information how RV7x0 calculates 64-bit floating point:
All shader processors in RV7x0 are natively 32-bit. There are 5 ALU's in each shader processer. When RV7x0 calculates an 64-bit MUL operation, it does it by using 4 of those 32-bit ALU's together. When RV7x0 calculates an 64-bit ADD operation, it combines 2 32-bit ALU's together.
That's why RV7x0 has floating point MUL throughput of 1/5 of it's 32-bit MUL thoughtput. There is no "group of 64-bit ALU's" like the article thinks.