No, this simply isn't true. It's a SIMD architecture that happens to be very good for rasterisation, and any other task that involves executing the same kernel across a regular set of parameters. It's not set that way specifically to support rasterisation - it's because SIMD arrays can give an order of magnitude performance increase over a MIMD design.
GPUs stopped being designed as graphics pipelines several generations ago. What the industry is doing now is recycling supercomputing designing from the 80s and early 90s and shrinking them down onto a single die. The highest performance to cost ratio come from single purpose systolic arrays, which is what the first generations of graphics cards were. SIMD arrays have been used to accelerate more general scientific workloads for decades. MIMD architectures give the least performance to cost ratio, but provide the greatest flexibility.
Just because rasterisation is efficient on an SIMD architecture doesn't mean that it is the only application that is. In the short-term (2-3 years) we'll see some hybrid approaches where alternatives to rasterisation are executed on the current architectures. In the medium to longer term we'll start seeing arrays of MIMD cores on graphics cards at which point the whole API will disappear. Larabee was the first aborted attempt in that direction, but it is inevitable that the market will go that way.
It's a shame that you are being lambasted by posters who didn't understand your point. Yes - lifting the code to a higher level of abstraction would definitely enable specialisation. Managed code is one way to go, certification would be another. In either case eliminating direct memory access, or proving that it is safe woud allow the removal of the barrier between hardware access and user-land code. That is precisely what is needed in this case.
We're not quite there yet, but you are right that it should be close. Take a GTX-580 for example. It can sustain 800GFlops on certain code sequences. If we assume that real-time means 50fps and 1080p is the target resolution then if we could average out the workload we'd hit 8000Flops per pixel per frame. That's certainly enough to do something interesting.
Sadly it doesn't work like that. We hit that huge performance number on a SIMD array with a really deep pipeline and partially manual cache management. As a result certain things are hard to do. Scene management is currently the painful thing to do as fitting a spatial division tree into memory in a way that those SIMD threads can access is really hard.
Your comments about exposing the functionality were correct a few generations of the hardware ago. They're no longer true as the programming model exposed by CUDA isn't tied to rasterisation at all. But it is still tied to running kernels over regular problem instances as that is what a SIMD architecture can do.
The whole 3dfx era was horrific, and as someone has already pointed out below DirectX made a huge positive impact in PC gaming. The article describes a real problem though: if I want to hit 50fps then my rendering needs to execute in under 20ms. Performing 5k system calls to draw chunks of geometry means that each syscall needs to be less than 4us, or about 12000 cycles on a 3Ghz processor. That is not a lot of time to do all of the internal housekeeping that the API requires and talk to the hardware as well.
The solution is not to throw away the API. The interface does need to change drastically, but not to raw hardware access. More of the geometry management needs to move onto the card and that probably means that devs will need to write in some shader language. It's not really lower-level / rawer access to the hardware. It is more that shader languages are becoming standardised as a compilation target and the API is moving on to this new target.
I really don't get your point at all: in what way does the quote show him to be mathematically naive? His statement is perfectly correct and it is a good description of why mathematical knowledge advances monotonically. There is no purer demonstration of the love of learning for the sake of it than the pursuit of maths (applying for the prize is obviously an entirely different issue). A system of logic built from the ground up for no other reason that it can be done, it reveals beauty and elegance, and one day may directly advance humanity in some undiscovered application.
Which part sounded like a propagation of the Mhz myth? Was it the bit where I said that a lower clock-speed was offset by a larger number of cores.... oh wait a minute, that would be the exact opposite...
Taking a snapshot of where the Longsoon is now and comparing against where AMD and Intel are now is flawed. The processor business chases moving targets, rather than comparing single samples you need to look at a longer history to try to estimate the rate of change.
Intel started 30 years ago. The Longsoon project started 9 years ago. In that time they have closed the gap on Intel to about 3 years. This 65nm design is comparable with something from about 2007 (the clock speed is lower but having 8 cores helps a lot). The real question is where they will go next.
If they meet their stated plan they are going to skip the 45nm node and make the Longsoon 3B on a 28nm process. They are aiming at a higher clockspeed, more cores and a large integrated vector co-processor that would rival Fusion or Larabee. If they can do what they claim then they are in the process of overtaking Intel and AMD now and we will see the effects on the world processor market over the next five years.
Whether or not they can do this is a big question, and according to the stories in the press it caused quite a debate at HotChips when they announced these plans. It's not clear who will be licensing them a 28nm fab, or quite how they've packed that much into a design. It's not clear how AMD and Intel will respond to a new competitor with state backed funding and a huge protected market.
It sounds as if distributed these large files would require some kind of transfer protocol to get the file from one machine to another. If only there was some way to integrate that with the hypertext markup language that we use on a daily basis! It would be like some kind of hypertext transfer protocol and then you could distribute your file just by sending somebody a link to where this file existed.
Oh well, keep on dreaming. It will be a long time before anybody invents a working http with links that could embedded in emails smaller than 10MB....
How about you provide a link? The claim that you've given is way too vague to interpret properly but sounds basically wrong. More thermal paste = better conduction between the chip and the the heatsink. It's function is highly conductive glue to bond the two components together and allow heat to pass easily between. There isn't an amount that would be "too much".
Just yesterday I finished scraping some of the Canonical crap off of my Ubuntu desktop. Lo and behold, it is so well written that removing the message status icon from the taskbar corrupts everything in the gnome-settings registry and nukes the widget manager causing some weird-ass fallback from 1997 to start rendering the controls in each window. The point of scraping it off was that Empathy is broken and no-one will fix it, making it unusable with a gtalk account and evolution is a giant steaming pile of cack.
The "integration" provided in Ubuntu seems to be a huge step backwards and within a few years the parasite will have killed the host by converting Ubuntu into a kiosk that sells mp3s...
You seem to be confusing time and space. If there were 1000 of them over one billion years then the probability of their light cone intersecting ours is tiny. Unless you assume that on reaching maturity they somehow become a galactic civilisation with a presence in every star system. Even big noises like broadcast TV and nuke tests only propagate at the speed of light. If each civilisation manages to make a big noise for 1000 years after inventing radio then you still need to be in the right point in space, at the right point in time, in order to hear them.
Sad fact of the matter is that all of the grand space opera visions of the future rely on FTL that just doesn't look feasible. The alternative is life scrabbling around in its own backyard before it destroys itself. Unless our immediate neighbours go through the same process at the same time it will look like we are alone. Of course this isn't a testable/falsifiable difference to your opinion - they're both observationally equivalent.
Really? I hadn't heard about that aspect of the story. So previously there were only small players, but the effect of this bug is causing large players to be created. Gosh, I hope they are not also affected by the problem.
I didn't really read it like that. The GP makes a coherent argument for why this doesn't seem newsworthy - more than "I didn't really like it, meh".
The summary sounds pretty cool and grabbed my attention straightaway. If the article was actually about a hack that used the depth buffer to cluster points that move together (have consistent depth) into objects and then track them then it would be pretty cool. But really all he is doing is what the GP says - quantising the depth and projecting colours.
I would really like to see a hack that segments and clusters the depth buffer - that would be newsworthy, cool, useful and new.
Because they draw less geometry and so make fewer calls per frame?
No, this simply isn't true. It's a SIMD architecture that happens to be very good for rasterisation, and any other task that involves executing the same kernel across a regular set of parameters. It's not set that way specifically to support rasterisation - it's because SIMD arrays can give an order of magnitude performance increase over a MIMD design.
GPUs stopped being designed as graphics pipelines several generations ago. What the industry is doing now is recycling supercomputing designing from the 80s and early 90s and shrinking them down onto a single die. The highest performance to cost ratio come from single purpose systolic arrays, which is what the first generations of graphics cards were. SIMD arrays have been used to accelerate more general scientific workloads for decades. MIMD architectures give the least performance to cost ratio, but provide the greatest flexibility.
Just because rasterisation is efficient on an SIMD architecture doesn't mean that it is the only application that is. In the short-term (2-3 years) we'll see some hybrid approaches where alternatives to rasterisation are executed on the current architectures. In the medium to longer term we'll start seeing arrays of MIMD cores on graphics cards at which point the whole API will disappear. Larabee was the first aborted attempt in that direction, but it is inevitable that the market will go that way.
It's a shame that you are being lambasted by posters who didn't understand your point. Yes - lifting the code to a higher level of abstraction would definitely enable specialisation. Managed code is one way to go, certification would be another. In either case eliminating direct memory access, or proving that it is safe woud allow the removal of the barrier between hardware access and user-land code. That is precisely what is needed in this case.
We're not quite there yet, but you are right that it should be close. Take a GTX-580 for example. It can sustain 800GFlops on certain code sequences. If we assume that real-time means 50fps and 1080p is the target resolution then if we could average out the workload we'd hit 8000Flops per pixel per frame. That's certainly enough to do something interesting.
Sadly it doesn't work like that. We hit that huge performance number on a SIMD array with a really deep pipeline and partially manual cache management. As a result certain things are hard to do. Scene management is currently the painful thing to do as fitting a spatial division tree into memory in a way that those SIMD threads can access is really hard.
Your comments about exposing the functionality were correct a few generations of the hardware ago. They're no longer true as the programming model exposed by CUDA isn't tied to rasterisation at all. But it is still tied to running kernels over regular problem instances as that is what a SIMD architecture can do.
The whole 3dfx era was horrific, and as someone has already pointed out below DirectX made a huge positive impact in PC gaming. The article describes a real problem though: if I want to hit 50fps then my rendering needs to execute in under 20ms. Performing 5k system calls to draw chunks of geometry means that each syscall needs to be less than 4us, or about 12000 cycles on a 3Ghz processor. That is not a lot of time to do all of the internal housekeeping that the API requires and talk to the hardware as well.
The solution is not to throw away the API. The interface does need to change drastically, but not to raw hardware access. More of the geometry management needs to move onto the card and that probably means that devs will need to write in some shader language. It's not really lower-level / rawer access to the hardware. It is more that shader languages are becoming standardised as a compilation target and the API is moving on to this new target.
I really don't get your point at all: in what way does the quote show him to be mathematically naive? His statement is perfectly correct and it is a good description of why mathematical knowledge advances monotonically. There is no purer demonstration of the love of learning for the sake of it than the pursuit of maths (applying for the prize is obviously an entirely different issue). A system of logic built from the ground up for no other reason that it can be done, it reveals beauty and elegance, and one day may directly advance humanity in some undiscovered application.
[citation needed].
Since when is fusion a form of radioactive decay? It would also seem to be somewhat farfetched to describe the Big Bang as "radioactive decay".
That really sounds like the Mhz myth to you? Do you even know what the Mhz myth is?
Which part sounded like a propagation of the Mhz myth? Was it the bit where I said that a lower clock-speed was offset by a larger number of cores.... oh wait a minute, that would be the exact opposite...
Taking a snapshot of where the Longsoon is now and comparing against where AMD and Intel are now is flawed. The processor business chases moving targets, rather than comparing single samples you need to look at a longer history to try to estimate the rate of change.
Intel started 30 years ago. The Longsoon project started 9 years ago. In that time they have closed the gap on Intel to about 3 years. This 65nm design is comparable with something from about 2007 (the clock speed is lower but having 8 cores helps a lot). The real question is where they will go next.
If they meet their stated plan they are going to skip the 45nm node and make the Longsoon 3B on a 28nm process. They are aiming at a higher clockspeed, more cores and a large integrated vector co-processor that would rival Fusion or Larabee. If they can do what they claim then they are in the process of overtaking Intel and AMD now and we will see the effects on the world processor market over the next five years.
Whether or not they can do this is a big question, and according to the stories in the press it caused quite a debate at HotChips when they announced these plans. It's not clear who will be licensing them a 28nm fab, or quite how they've packed that much into a design. It's not clear how AMD and Intel will respond to a new competitor with state backed funding and a huge protected market.
The next five years will be interesting times...
Wow. Word for word an exact rendition of the scene in swordfish...
Not now that you've made me read it twice...
Do you really think that grammar is the biggest issue with this piece of flamebait?
Although if you had we would also have to put up with such gems as:
"Man bought a subscription to slashdot."
It sounds as if distributed these large files would require some kind of transfer protocol to get the file from one machine to another. If only there was some way to integrate that with the hypertext markup language that we use on a daily basis! It would be like some kind of hypertext transfer protocol and then you could distribute your file just by sending somebody a link to where this file existed.
Oh well, keep on dreaming. It will be a long time before anybody invents a working http with links that could embedded in emails smaller than 10MB....
Why stop there? May as well make it only 1% thicker and 150 hours battery life.
Thanks for the link, and thanks to the other two posters for their replies. I stand happily corrected as I've learned something new.
How about you provide a link? The claim that you've given is way too vague to interpret properly but sounds basically wrong. More thermal paste = better conduction between the chip and the the heatsink. It's function is highly conductive glue to bond the two components together and allow heat to pass easily between. There isn't an amount that would be "too much".
Awesome post, shame I have no mod points today.
Just yesterday I finished scraping some of the Canonical crap off of my Ubuntu desktop. Lo and behold, it is so well written that removing the message status icon from the taskbar corrupts everything in the gnome-settings registry and nukes the widget manager causing some weird-ass fallback from 1997 to start rendering the controls in each window. The point of scraping it off was that Empathy is broken and no-one will fix it, making it unusable with a gtalk account and evolution is a giant steaming pile of cack.
The "integration" provided in Ubuntu seems to be a huge step backwards and within a few years the parasite will have killed the host by converting Ubuntu into a kiosk that sells mp3s...
Why?
You seem to be confusing time and space. If there were 1000 of them over one billion years then the probability of their light cone intersecting ours is tiny. Unless you assume that on reaching maturity they somehow become a galactic civilisation with a presence in every star system. Even big noises like broadcast TV and nuke tests only propagate at the speed of light. If each civilisation manages to make a big noise for 1000 years after inventing radio then you still need to be in the right point in space, at the right point in time, in order to hear them.
Sad fact of the matter is that all of the grand space opera visions of the future rely on FTL that just doesn't look feasible. The alternative is life scrabbling around in its own backyard before it destroys itself. Unless our immediate neighbours go through the same process at the same time it will look like we are alone. Of course this isn't a testable/falsifiable difference to your opinion - they're both observationally equivalent.
Really? I hadn't heard about that aspect of the story. So previously there were only small players, but the effect of this bug is causing large players to be created. Gosh, I hope they are not also affected by the problem.
So.... a really verbose version of:
Pc = find()
Hd = Pnxt-Pc
Obst = detect(Hd)
if Obst!=NULL
Hds = turned_heading(Hd)
detect()
Hate to say it, but the AC may have a point...
I didn't really read it like that. The GP makes a coherent argument for why this doesn't seem newsworthy - more than "I didn't really like it, meh".
The summary sounds pretty cool and grabbed my attention straightaway. If the article was actually about a hack that used the depth buffer to cluster points that move together (have consistent depth) into objects and then track them then it would be pretty cool. But really all he is doing is what the GP says - quantising the depth and projecting colours.
I would really like to see a hack that segments and clusters the depth buffer - that would be newsworthy, cool, useful and new.
Nah. Doesn't follow. I have made up the number 1222884, does this grant me exclusive rights over its use?
I have invented the equation x=1/27y, again do I get exclusive rights over its use?
If you think that code is different - then why? What law would grant it this particular status, and in which jurisdictions?
Why exactly would I need a license to use a piece of software?