ATI's 1GB Video Card
Signify writes "ATI recently released pics and info about it's upcoming FireGL V7350 graphics card. The card features 1GB of GDDR3 Memory and a workstation graphics accelerator. From the article: 'The high clock rates of these new graphics cards, combined with full 128-bit precision and extremely high levels of parallel processing, result in floating point processing power that exceeds a 3GHz Pentium processor by a staggering seven times, claims the company.'"
- You have an OpenGL driver for the GPU and a JIT for shader language programs. This means you can completely throw out the instruction set between minor revisions if you want to. An x86 CPU must mimic bugs in the 486 to be compatible with software that relied on them.
- You have an easy problem. Graphics processing is embarrassingly parallel. You can pretty much render every pixel in your scene independently[1]. This means that you can almost double the performance simply by doubling the number of execution units. To see how well this works for general purpose code, see Itanium.
- The code you are running is fairly deterministic and unbranching. Up until a year or two ago, GPUs didn't even support branch instructions. If you needed a branch, you executed both code paths and threw the result you didn't need away. Now, branches exist, but they are very expensive. This doesn't matter, since they are only used every few thousand cycles. In contrast general purpose code has (on average) one branch every 7 cycles.
GPUs and CPUs are very different animals. If all you want is floating point performance, then you can get a large FPGA and program it as an enormous array of FPUs. This will give you many times the maximum theoretical floating point throughput of a 3GHz P4, but will be almost completely useful for over 99% of tasks.[1] True of ray tracing. Almost true of current graphics techniques.
I am TheRaven on Soylent News
This is a workstation card, not a games card. The people buying this are likely to be either CAD/CAM people with models that are over 512MB (the workstation it plugs into will probably have a minimum of 8GB of RAM), or researches doing GPUPU things. To people in the second category, it's not a graphics card it's a very fast vector co-processor (think SSE/AltiVec, only a lot more so).
I am TheRaven on Soylent News
Appreciate the joke, but for folks out there who think he is serious, Microsoft has said that the Intel GMA 900 and ATI Radeon X200 are the minimum graphics cards for using the "new" DirectX GUI. Vista will work on computers with less graphics systems, but in a compatability mode similar to Windows XP's GUI.
However, how difficult would it be to write an operating system that offloaded floating point operations to the GPU, and everything else to the CPU.
Funny you should mention that. The Intel 386 (and up) architecture has built in support for a floating point coprocessor, so it can offload floating point operations. In the early days, you could buy a 387 math coprocessor to accelerate floating point performance. Then Intel integrated the 387 coprocessor onto the 486 series cpus, and today we just know it as "the floating point unit" (although it's been much revised, parallelized, and super-scaled).
As for offloading to a GPU, well... that's what we do today. It's called Direct3D, or Mesa, or Glide, or your favorite 3D acceleration library. The problem with this approach is that it requires very specialized code. It's not something that can be automatically done for just any code, as the overhead of loading the GPU, setting up the data, and retrieving the results would far exceed the performance gains. In only extereme cases does it pay off: the workload has to be extremely parallelizable, with almost no branching and predictable calculations. Basically what it ends up is that the algorithm has to be extensively tailered to the GPU. Even IBM has had major issues offloading general purpose operations to their special processing units, and those are much more closely coupled to the CPU.