In that case, I apologize for the misunderstanding. Of course, the video output would still need a buffer just like every other I/O device. And that is, of course, minuscule compared to the amount of ram that goes on a video card when it is, as currently, so much more than an I/O device. Before GPUs went onboard, video cards were carrying about 8-16MB framebuffers for uncompressed data.
It is worth noting, however, that the concept you coined as "dual outputs" is essentially how most system-on-a-chip devices work. I/O devices are simply mapped to a specific address in main memory, and that's their buffer. I cannot vouch for their use in more performant devices, but I'm sure it's not unheard of. In this case, its quite undesirable. Video output should be just a normal I/O operation...just with higher bandwidth requirements than ethernet or firewire. In keeping the normal I/O device philosophy, the CPU can do software rendering to the video card's framebuffer without a dedicated GPU, or delegate that task to one. Thankfully, the cache size needed for the framebuffer is considerably less than 36MB even for triple buffering, because it uses good compression algorithms to store the frames. To reduce bandwidth requirements, it would be a good idea to standardize the compression (and keep it softcoded for upgradeability through firmware), allowing lower bandwidth requirement between the processor cluster and the video output. Uncompressed data throughput for 60fps at your mentioned resolution would be ~2GBps...more than even the best GPUs today can handle on top of the other memory accesses they require.
I'm also thinking about dual GPUs working in tandem for the same output, so a dedicated I/O buffer still makes the most sense there too (rendering alternate frames would be the most easily implemented method currently in use)...and I'm rambling again, for which I apologize. I know, I know...I should start a weblog so that everyone who cares can gather in one place and criticize me all at once.:P I have to stop now.
That's a rather short-sighted point. Everything, CPU especially, benefits from faster memory access just as much as does the CPU, and applying the current technology that is invested in dedicated GPU memory solutions across the whole system has a performance impact upper bound of O(c). I seriously doubt that GPU+Physics hungry processing will ever account for less than two thirds of overall memory I/O (barring large file transfers, network services, and other things that do not go alongside rendering/physics hungry operations).
It's a one time performance hit with an everlasting performance payoff. If you have a hard time seeing the potential payoff, consider the new nVidia processors, and the benefit they would receive from higher bandwidth inter-processor communication for that shiny new API that offloads more vector hungry work for non-rendering purposes. A GPU as an actual standalone specialized processor has radically higher potential than a system that was inherently designed to have data dumped on it and never seen again.
To gain further perspective, recall that we're cutting GPU performance by considerably less than one half. Even when all physical limitations have been reached, bandwidth can be doubled by doubling data bus width. Bandwidth is bound by a constant too, and it's a bigger one than the inverse of the one-time performance hit. Latency is not even an issue, as it would not even be affected, so long as each processor socket gets its own channel of data access, capable of operating simultaneously on mutually exclusive memory spaces (worst case:latency would then be doubled and bandwidth halved on threadlocked data thrashing between processors, which is still a far cry better than communicating across PCIe). Now as adding channels is effectively doubling data bus width, it would be more reasonable to expect one socket to have a dedicated memory channel, and another shared between remaining sockets...with the intention that the most important function (GPU for games or other rendering tasks) would get the dedicated socket. AMD's onboard northbridge could seem like a problem, but in fact, it's actually a minor detail of implementation. The "master processor" could easily be responsible for forwarding additional channels of memory I/O to other devices and enforcing data consistency...that's not exactly a new job for it. In fact, the system I'm describing is radically simpler and more direct than the one currently in use, provided multiprocessor capability is already present (which it is). This may seem more complicated, but only because I got somewhat wrapped up in expressing some of the inane details explicitly.
Bottom line: if we're willing to double the current GPU memory bus width, split into 2 channels (one dedicated to the GPPU) and use it (and GDDR-whatever-the-latest-version-is RAM) as the model for our main memory, the very first system of this design will already boast a performance increase and a powerful potential as platform for much more improvement yet. With current graphics cards, a lot of proprietary technology is bundled in one gelatinous package, with no interchangeable parts. It's time for that technology to branch into segments where individual components can shine on their own merit, and more companies can become a part of the graphics+specialized processing technological landscape. It's almost like an open source argument for hardware, that the open system is inherently capable of maximizing the use of human capacity for ingenuity.
In my opinion, it's a time for specialized (read, not general purpose OR necessarily x86 ISA based) secondary processors in general to shine, and reach a more commonplace status.
Having this "Pin compatible GPU" doesn't mean no video card. Rather, it means the video card just goes back to being merely a display output device. Obviously, graphics ram would disappear altogether (as it would replace main memory), and the other display chips, such as the RAMDACs, would remain on the video card. But that beast of a GPU would get a nice socket, HSF, and all the ram it wants, plus CPUs would get GPU-class ram accessing performance.
To top it off, your next "video" upgrade would be the cost of the GPU...sans ram, output processing, PCB, and cooling solution costs.
Also, your next ram upgrade would benefit your WHOLE system, be it size or performance, and such an interchangeable system could use its extra slots for graphics, physics, hardware acceleration of secure sockets, or whatever non-core function your machine most needs. The "CPU" slot becomes the next standard of extension, partially replacing PCIe, and "onboard video" becomes about as horrible a thing as onboard ethernet.
That argument will wash when a series of algorithms exist whose effectiveness at inferring 3D data can actually hold a candle light to what the human brain does automatically and subconsciously...for free, and in real time.
In the future, All image and 3D data will be stored in a single pixel. Complex (therefore believable) algorithms will extrapolate/interpolate all past, present, and future works of man.
Believe it baby.
It'll be just like dark matter, one pound of which weighs over 1 million tons.
[Services For Unix] allows Windows to act as a server but not a client with respect to standard *nix protocols like NFS.
I use SFU solely for enabling my Windows boxes to connect to NFS shares...so what are you talking about?
That's hardly a concern when the converted cars are still starting out as gasoline-burning vehicles purchased from your local dealership.
In that case, I apologize for the misunderstanding. Of course, the video output would still need a buffer just like every other I/O device. And that is, of course, minuscule compared to the amount of ram that goes on a video card when it is, as currently, so much more than an I/O device. Before GPUs went onboard, video cards were carrying about 8-16MB framebuffers for uncompressed data.
It is worth noting, however, that the concept you coined as "dual outputs" is essentially how most system-on-a-chip devices work. I/O devices are simply mapped to a specific address in main memory, and that's their buffer. I cannot vouch for their use in more performant devices, but I'm sure it's not unheard of. In this case, its quite undesirable. Video output should be just a normal I/O operation...just with higher bandwidth requirements than ethernet or firewire. In keeping the normal I/O device philosophy, the CPU can do software rendering to the video card's framebuffer without a dedicated GPU, or delegate that task to one. Thankfully, the cache size needed for the framebuffer is considerably less than 36MB even for triple buffering, because it uses good compression algorithms to store the frames. To reduce bandwidth requirements, it would be a good idea to standardize the compression (and keep it softcoded for upgradeability through firmware), allowing lower bandwidth requirement between the processor cluster and the video output. Uncompressed data throughput for 60fps at your mentioned resolution would be ~2GBps...more than even the best GPUs today can handle on top of the other memory accesses they require.
I'm also thinking about dual GPUs working in tandem for the same output, so a dedicated I/O buffer still makes the most sense there too (rendering alternate frames would be the most easily implemented method currently in use)...and I'm rambling again, for which I apologize. I know, I know...I should start a weblog so that everyone who cares can gather in one place and criticize me all at once. :P I have to stop now.
That's a rather short-sighted point. Everything, CPU especially, benefits from faster memory access just as much as does the CPU, and applying the current technology that is invested in dedicated GPU memory solutions across the whole system has a performance impact upper bound of O(c). I seriously doubt that GPU+Physics hungry processing will ever account for less than two thirds of overall memory I/O (barring large file transfers, network services, and other things that do not go alongside rendering/physics hungry operations).
It's a one time performance hit with an everlasting performance payoff. If you have a hard time seeing the potential payoff, consider the new nVidia processors, and the benefit they would receive from higher bandwidth inter-processor communication for that shiny new API that offloads more vector hungry work for non-rendering purposes. A GPU as an actual standalone specialized processor has radically higher potential than a system that was inherently designed to have data dumped on it and never seen again.
To gain further perspective, recall that we're cutting GPU performance by considerably less than one half. Even when all physical limitations have been reached, bandwidth can be doubled by doubling data bus width. Bandwidth is bound by a constant too, and it's a bigger one than the inverse of the one-time performance hit. Latency is not even an issue, as it would not even be affected, so long as each processor socket gets its own channel of data access, capable of operating simultaneously on mutually exclusive memory spaces (worst case:latency would then be doubled and bandwidth halved on threadlocked data thrashing between processors, which is still a far cry better than communicating across PCIe). Now as adding channels is effectively doubling data bus width, it would be more reasonable to expect one socket to have a dedicated memory channel, and another shared between remaining sockets...with the intention that the most important function (GPU for games or other rendering tasks) would get the dedicated socket. AMD's onboard northbridge could seem like a problem, but in fact, it's actually a minor detail of implementation. The "master processor" could easily be responsible for forwarding additional channels of memory I/O to other devices and enforcing data consistency...that's not exactly a new job for it. In fact, the system I'm describing is radically simpler and more direct than the one currently in use, provided multiprocessor capability is already present (which it is). This may seem more complicated, but only because I got somewhat wrapped up in expressing some of the inane details explicitly.
Bottom line: if we're willing to double the current GPU memory bus width, split into 2 channels (one dedicated to the GPPU) and use it (and GDDR-whatever-the-latest-version-is RAM) as the model for our main memory, the very first system of this design will already boast a performance increase and a powerful potential as platform for much more improvement yet. With current graphics cards, a lot of proprietary technology is bundled in one gelatinous package, with no interchangeable parts. It's time for that technology to branch into segments where individual components can shine on their own merit, and more companies can become a part of the graphics+specialized processing technological landscape. It's almost like an open source argument for hardware, that the open system is inherently capable of maximizing the use of human capacity for ingenuity.
In my opinion, it's a time for specialized (read, not general purpose OR necessarily x86 ISA based) secondary processors in general to shine, and reach a more commonplace status.
Having this "Pin compatible GPU" doesn't mean no video card. Rather, it means the video card just goes back to being merely a display output device. Obviously, graphics ram would disappear altogether (as it would replace main memory), and the other display chips, such as the RAMDACs, would remain on the video card. But that beast of a GPU would get a nice socket, HSF, and all the ram it wants, plus CPUs would get GPU-class ram accessing performance. To top it off, your next "video" upgrade would be the cost of the GPU...sans ram, output processing, PCB, and cooling solution costs. Also, your next ram upgrade would benefit your WHOLE system, be it size or performance, and such an interchangeable system could use its extra slots for graphics, physics, hardware acceleration of secure sockets, or whatever non-core function your machine most needs. The "CPU" slot becomes the next standard of extension, partially replacing PCIe, and "onboard video" becomes about as horrible a thing as onboard ethernet.
That argument will wash when a series of algorithms exist whose effectiveness at inferring 3D data can actually hold a candle light to what the human brain does automatically and subconsciously...for free, and in real time.
In the future, All image and 3D data will be stored in a single pixel. Complex (therefore believable) algorithms will extrapolate/interpolate all past, present, and future works of man.
Believe it baby.
It'll be just like dark matter, one pound of which weighs over 1 million tons.