Tile Based Rendering and Accelerated 3D
ChickenHead writes "AnandTech has put together a review
of the Hercules 3D Prophet 4500 based on the new Kyro II chip from STMicro.
What's unique about this particular chip is that it uses a Tile-based Rendering
Architecture which results in a much greater rendering efficiency than conventional
3D rendering techniques. It is so efficient in fact, that the $149 Kyro II card
clocked at 175MHz is able to outperform a GeForce2 Ultra with considerably more
power and around 3X the cost of the Kyro II card. With games not able to take
advantage of the recently announced GeForce3's feature set, the Kyro II may
be a cheap solution to tide you over until the programmable GeForce3 GPU becomes
a necessity." A very readable and interesting summary and an interesting technology and a potentially extremely cool video card.
..I will wait for the obligatory Mr. Carmack response modded to +5. I'm hoping he's busy writing it now :)
Praying for the end of your wide-awake nightmare.
Nah - people have designed graphics chips that hit 'perfect' fill rates before - I know I did one (for the Mac 7-8 years back) that hit 1.2Gb/sec into VRAM (then state of the art DRAM) exactly as it was designed to.
Graphics chips have a relativly long history that is at least in part driven by the comodity memory technologies they have available to them. These days we're particularly troubled - system costs are going down, DRAM speeds haven't kept pace with CPU/GPU speed increases (CPUs have maybe gone from 100MHz to 1GHz in the time that memory has gone from 66MHz to 266MHz [transfer rate - latencies have only halved]).
'Tricks' like ISS (aka tiled frame buffers) work because they basicly cache the problem - at the expense of keeping an ordered polygon list (which means that you are more sensitive to scene complexity - too many more polys than pixels and you might be in big trouble) and latency (because you can't finish the poly sort stage before you start rendering - so you have to render a complete screen at once - while maybe buffering the next scene's polys in parallel) - note I'm over simplifying the problems here to explain some of the issues - there's lots of scope for smart people to do smart things in a space like this (before all the patents are granted - then without competition inovation will probably cease :-( )
This is an instance of the old ATM vs IP or CISC vs RISC debates. It's the old engineering tradeoff: work smart but slow or work quick and dirty. Tile based rending is an instance of smart and slow, ie they do no more work than they have to, and thus get away with slower clocks and memory. The NVIDIA card is quick and dirty.
Historically, it is almost always the case that quick and dirty is the cheaper way to go, as it allows economies of scale to come into play. However, it is seeming more and more like the memory bandwidth bottleneck is here to stay, so the smart and slow approach is looking pretty good. Likewise as we run into physical limitations for network bandwidth, IP is going to have a harder and harder time to provide acceptable QoS and multicast solutions and ATM-like technologies will start becoming more prevalent.
Interesting. See the article kept saying "It's great value for the price--sometimes it even beats a GF Ultra". No one said it was superior to the most expensive consumer 3d hardware. . .
Space invaders *SO* fast on this card, like 23000 FPS
--
Je t'aime Stéphanie
By the way, did you know you can use the Dreamcast Broadband Adapter to connect to your PC for some do-it-yourself development? Very cool...
If the poster had read the benchmarks, it would be obvious that the case is not so cut and dry. The card wins at some things, loses at others. It loses to the GF2GTS in some benchmarks, and beats the GF2 Ultra in others. A very cool card, and worlds beyond anything in its price range, however. This should do very good things to the low price range performance market as a whole, by pushing down other prices and by providing a cool new technology.
the answer is very simple: the chip doesn't need it. Read the article, look at the later benchmarks -- the chip is actually achieving its theoretical fillrate. This has never happened before in the entire history of the graphics chip industry except perhaps in their previous chips. This is amazingly new. If they gave it more memory, guess what -- the numbers would be the same. The whole point is the chip is so good at what it does it doesn't need the bandwidth. Now, if they went to four pipelines and a DDR interface, that would be cool. But, the tileing architecture may not be that fast.
Tile-based rendering's big benefit it that is reduces overdraw to 0; that is, each opaque pixel on the screen is drawn exactly once. Performance for certain types of scenes is spectacular.
Dreamcast uses this, as well as many of Sega's arcade systems (HOTD2, for instance), which use the same PowerVR2 rendering system.
Where tile-based rendering falls down, however, is for scenes that contain a large amount of alpha-blended areas. Alpha-blended areas in today's hardware are necessarily drawn multiple times, from back-to-front, to accomplish transparency effects. Having to draw the pixel several times nullifies the zero-overdraw benefit of tile rendering. Since most tile-rendering systems trade fill-rate for zero overdraw, cards with insufficient fill rate for large alpha areas (read: all of them) fall down on large, alpha blended polygons. You can see this in House of the Dead 2 when fighting the Hierophant; if you get enough water splash effects on the screen, the frame rate chokes.
Tile rendering works extremely well for areas that are opaque, or use only small alpha-blended areas. It's getting better; it's just not perfect yet.
Mumbly Joe
Here's how it works:
Anyway, because the system uses ZERO memory bandwidth for Z-buffer calculations, the system is far more efficient, even though it is essentially traversing the scene dozens of times for each frame.
This is why the Sega Dreamcast is often able to have better performance than the Playstation 2.
Cryptnotic
My other first post is car post.
There is a good article on it, as applied to the powervr (which is using the same kind of architecture) at http://www.ping.be/powervr/PVRSGRendMain.htm. As others already said, you can see the results on the Dreamcast, or on the arcade version, the Naomi.
The strenghts are obvious:
The weaknesses are a little less obvious:
As a result, these cards are nice, but mostly represent another set of tradeoffs, not necessarily a revolution.
OG.
If you want to find out what is amazing about this card, read on: This card is based on NEC's powerVR architecture, and is really nothing more than the PowerVR2 clocked up to 175 mhz. What's funny is, I remember getting excited about this card over 3 years ago!! If you want to do more research on the architecture, dig up some old articles on Tom's hardware, where he benches it with quake1. At the time, the card was supposed to clean up the market, and it was going to debut at 125 mhz core/memory speed. (This was at the time when the voodoo1 was the standard, and the voodoo2 had just entered the scene, I remember holding out for this card, and simply settled on a TNT when I found out that NEC decided to drop out of the PC market). Then NEC made a deal with Sega, and put the chip in the dreamcast. What's even more amazing about the chip, is that ST simply had to change the clock to 175 mhz to make it competitive with nvidia's gefore2 ultra. What I think will be scary, is when they revamp this 4 year old chip design, and add T & L. Imagine what a chip like this could do with DDR RAM instead of SDRAM. This current chip only supports SDRAM, which is why they didn't put DDR RAM on the card. I think nvidia has their work cut out for them. Hopefully they will be able to license tile based rendering for their next card. I was really hoping that they would put it in the geforce 3, it would have made quite a bit greater difference than a crossbar memory architcture.