Nvidia Working on a CPU+GPU Combo
Max Romantschuk writes "Nvidia is apparently working on an x86 CPU with integrated graphics. The target market seems to be OEMs, but what other prospects could a solution like this have? Given recent development with projects like Folding@Home's GPU client you can't help but wonder about the possibilities of a CPU with an integrated GPU. Things like video encoding and decoding, audio processing and other applications could benefit a lot from a low latency CPU+GPU combo. What if you could put multiple chips like these in one machine? With AMD+ATI and Intel's own integrated graphics, will basic GPU functionality be integrated in all CPU's eventually? Will dedicated graphics cards become a niche product for enthusiasts and pros, like audio cards already largely have?" The article is from the Inquirer, so a dash of salt might make this more palatable.
- Memory Management Units. Even in microcomputers there are some (old m68k machines) that have an off-chip MMU (and some, like the 8086 that just don't have one).
- Floating Point Units. The 80486 was the first x86 chip to put one of these on-die.
- SIMD units. Formerly only found in high-end machines as dedicated chips, now on a lot of CPUs.
- DSPs. Again, formerly dedicated hardware, now found on-die in a few of TI's ARM-based cores.
A GPU these days is very programmable. It's basically a highly parallel stream processor. Integrating it onto the CPU makes a lot of sense.I am TheRaven on Soylent News
http://www.cap-lore.com/Hardware/Wheel.html
This is known as the wheel of reincarnation, and has come up several times in the last forty years of graphics hardware.
Moving it right up next to the CPU will allow the data to flow between the two alot faster than currently where it has to go over a bus... they can finally get rid of the bottlenecks that have been around since the two were seperated.
I'm not good at making signatures...
When I saw this headline I immediately thought of this article, an interview with Jen-Hsun Huang (CEO: nVidia) by Wired dated July '02. In it, the intention of overthrowing Intel is made quite clear, and ironically enough they even mention the speculation from a time when it was rumored that nVidia and AMD would merge.
It's actually a very good article for those interested in nVidia's history and Huang's mentality. Paul Otellini ought to be afraid. Very afraid.
I am an old school programmer so I tend to use ints a lot. The sad truth if that float using SSE are as fast and sometimes faster than the old tricks we used to avoid floats!
Yes we live in an upside down world where floats are faster than ints some times.
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Yes. It absolutely matters. It makes a huge difference in image quality.
It matters when we go to sample textures, it matters when we enable AA, it matters.
I currently have no clever signature witicism to add here.
No, it doesn't. Note that I said pixel, not coordinate.
The coordinates should be as accurate as possible, but having a pixel more accurate than twice the resolution of the display serves very little purpose.
A guy from Intel recently presented at a seminar at my university. He is working with a group that is pushing for a CPU architecture that looks kind of like a GPU, when you look at it at a very high level (and perhaps your eyes squinted just a bit).
The unofficial title of his talk was 'the war going on inside your PC'. He argued that the design of future CPUs and GPUs will eventually converge, with future architectures being comprised of a sea of small and efficient but tightly interconnected processors (no superscalar), and that it is basically a race to see who will get there first - the CPU manufacturers or the GPU manufacturers.
One of his main points was that with increased compiler effort, potentially many computational workloads can be made to run on the tiled architecture of simple processors, much in the way that the process of graphic rendering has been able to be shifted into the type of workload that can leverage the 'tiles of simple processors' found in a graphics card today, even though the nature of graphic rendering was originally better suited for execution in a typical CPU, where control dependent loads run efficiently. When the workload cannot be mapped to the 'tiles of simple processors' architecture, just slap a superscalar processor in the corner of your die (like nvidia seems to be doing) to take care of those small corner cases.
So, we will likely be seeing a lot more of this in the future. Especially now that AMD and ATI are together.
(More details on the abstract of the presentation I mentioned can be found here)
Why are the multiprocessor units suddenly so popular, relative to why e.g. the Voodoo graphics cards failed? I remember them being ridiculed and ending up in the performance backwaters through their 2-4-8(-16) multiprocessor cards, but it seems that there are engineering reasons why multiple processors are now suddenly coming into favour, or?
multiple processors (CPU, GPU or otherwise) are a way to add more 'cycles' based on current technology. This has the advantage of being able to get more out of your current designs and manufacturing technology, but comes at the cost of increased complexity in both the supporting hardware, and in software.
Getting a single core implementation faster is always the more efficient way to add processing capacity, but it is very impractical beyond a certain point due to power and heat considerations (where that point is exactly depends on the state of technology at any given moment but in the end is limited by the physical size of molecules, at least for as far as current technology goes)
So, multiple processors is not directly better from an engineering point of view, rather, it is a solution to overcome the speed limits of current technology, provided you can deal with the extra complexity (moving much of the hardware complexity into the chip itself like AMD and Intel are doing now removes the burden from systemboard designers, but the complexity itself is still there, esp. on the software side of the picture).
With regards to 3dfx, it seems to me that:
1. They failed to manage the additional complexity
2. As their competition showed, limits of technology at that time were much higher then what 3dfx managed, which indicates there were problems with either their design or manufacturing technology, or more likely, with both.
That's sort of the point of building them on the same die. You can't just run a wire to it, as it would be quite slow. Wires tend to have parasitic inductances and capacitances, so the setup and hold times on the lines would be too large to provide a benefit.
System RAM is SLOW compared to GPU RAM. PCIe actually allows very high speed access to system RAM, but the RAM itself is too slow for GPUs. That's one of the reasons their RAM amounts are so small, they use higher speed and thus more expensive RAM. Also because of the speed you end up dealing with cooling and signal issues which makes it impractical (or perhaps impossible) to simply stick it in addon slots to allow for upgrades.
Even fast as it is, it's still slower than the GPU would really like.
What you've suggested is already done by low end accelerators like the Intel GMA 950. Works ok, but as I said, slow.
Unless you are willing to start dropping serious amounts of cash on system RAM, we'll be needing to stick with dedicated video RAM here for some time.
Typically, unimplemented instructions cause an exception. The operation can then be emulated in software.
Intron: the portion of DNA which expresses nothing useful.
1) Processors are wicked fast at floating point these days. Have a look at the benchmarks a modern chip using SSE2 can do some time. Integer doesn't inherently mean faster, and chips these days have badass FPUs.
2) For many things, it DOES make a difference. You might ask why do we need more than 24-bit (or 32-bit if you consider the alpha channel) integer colour? After all, it's enough to look highly realistic. Yes well that's fine for a final image, but you don't want to do the computation like that. Why? Rounding errors. You find that with iterative things like shaders doing them integer adds up to nasty errors which equals nasty colours and jaggies. There's a reason why pro software does it as 128-bit FP (32-bits per colour channel) and why cards are now going that way as well.
3) In modern games, everything is handled in the GPU anyhow. The CPU sends over the the data and the GPU does all the transform, lighting, texturing and rasterizing. The CPU really is responsible for very little. With vertex shaders the GPU even handles a good deal of the animation these days. The reason is that not only is it more functional but it's waaaaay faster. You can spend all the time you like trying to make a nice optimised integer T&L path in the CPU, the GPU will blow it away. You actually find that some older games run slower than new ones because they rely on the CPU to do the initial rendering phases like T&L before handing it off, whereas newer games let the GPU handle it and thus run faster even though having higher detail.
I believe the last option (option 7) is what x86/87 CPU/FPU combo actually used. That's why there is a coprocessor-prefix in front of the FPU instructions. They are not just unused opcodes.
Option 5 (and sometimes even 3) is commonly used for MMX/3dNOW/SSE/SSE2/SSE3/whatever instructions.
Unless they *really* need nonportable features, most programmers tend to go with option 2.
Yes the SYSTEMS Tom used to test have normal speed ram for systems. Duh. The graphics cards, however, have much faster RAM. For example my system at home has DDR2-667 RAM. That's spec'd to run at 333MHz which is 667MHz is DDR RAM speak. My graphics card, a 7800GT, on the other hand has RAM clocked at 600MHz, or 1200MHz in RAM speak.
Not a small difference, really. My system RAM is rated to somewhere around 10GB/second max bandwidth (it gets like 6 in actuality). The graphics card? 54GB/sec.
Video cards have fast RAM subsystems. They use fast, expensive chips and they have controllers designed for blazing fast (and exclusive) access. You can't just throw normal, slow, system RAM at it and expect it to perform the same.
When Timna was initially finished, RAMBUS was still so expensive that Timna's release had to be delayed so that a (PC100-to-RAMBUS) memory translator could be added. Those of us who followed chipsets back then know how badly RAMBUS and memory translators bombed. The integrated RAMBUS memory controller had to be the biggest reason Timna was cancelled. This might also be a reason Intel doesn't integrate a memory controller onto their current CPUs.
Interestingly, Timna was the first project of Intel's new Israeli design team. Not a great start, but their second project was pretty darned good (Pentium M/Centrino).
TO START
PRESS ANY KEY
Where's the 'ANY' key? I see Esk, Kitarl, and Pig-Up...
Having the GPU on the same chip or die as the CPU would reduce the latency by several orders of magnitude and allow a much higher clock for the bus between the two. The memory access could also be improved dramatically, depending on how it was implemented.
I think the first example of this integration we see will use the HyperTransport bus and a single package with CPU and GPU on different dies, though fabbed on the same process. This could be done with an existing AMD socket and motherboard.
Before this happens, though, I think we will see graphics cards on HTX slots. For those who do not know, HTX slots were introduced in a recent revision of the HyperTransport standard. They allow an add-in card to communicate with the CPU with much lower latency and higher bandwidth than PCIe, and no controller in between. The add-in card could even have another CPU on it, and the performance would be comparable to current AMD SMP systems. A GPU on an HTX card could have its own RAM, and be able to access system RAM much faster than PCIe allows. The neat thing is that with HT, the CPU would probably be able to use the graphics RAM as though it were system RAM.
Note that Nvidia is a member of the HyperTransport Consortium due to their chipset business, and they could easily have HTX cards in their labs right now.
You'd be mistaken. See the slide on Texture Mapping.
Perspective divide is performed before texture sampling. This is necessary to get proper texture step sizes, for correct sampling of the texture onto the pixel.
Fractional pixel locations are also used in antialiasing.
I currently have no clever signature witicism to add here.