Tile Based Rendering and Accelerated 3D
ChickenHead writes "AnandTech has put together a review
of the Hercules 3D Prophet 4500 based on the new Kyro II chip from STMicro.
What's unique about this particular chip is that it uses a Tile-based Rendering
Architecture which results in a much greater rendering efficiency than conventional
3D rendering techniques. It is so efficient in fact, that the $149 Kyro II card
clocked at 175MHz is able to outperform a GeForce2 Ultra with considerably more
power and around 3X the cost of the Kyro II card. With games not able to take
advantage of the recently announced GeForce3's feature set, the Kyro II may
be a cheap solution to tide you over until the programmable GeForce3 GPU becomes
a necessity." A very readable and interesting summary and an interesting technology and a potentially extremely cool video card.
There is a serious problem with the memory bandwidth of current cards, but embedded memory promises to alleviate this situation.
I don't think it's a hard wall by any means.
As we all know, lack of competition always leaves the consumers at a disadvantage. While this card won't be a hit among the Geforce3 target group, it could seriously cut into nVidias market, along with the Radeon. And while tile rendering has some strengths and some weaknesses, who is to say who'll run into the biggest problems... I doubt RAM, even DDR SDRAM will go all that much faster, so if they could create a tile rendering chip that needed the current bandwidth, it could really be something.. Might Kyro be to Geforce what AMD is to Intel? Time will tell..
Live today, because you never know what tomorrow brings
It's not as simple as that. You can have partially overlapped polygons amongst other things. Totally occluded polys can be culled without overdraw- partially occluded ones need some sort of clipping/culling done in one way or another to render right (Or you end up with gaps in the objects, etc.). Usually what is applied is a "painter's algorithm" which determines which order in space the polys are and paints them in order on the screen. That translates into overdraw. Some engines strive to minimize overdraw (such as Quake III) and others (such as Serious Sam) don't, letting the card deal with the problem. This is why you see such a disparity with Serious Sam logging such high scores for the Kyro II and the Kyro weighing in as a mid-range card- Croteam's apparently not concerning themselves as much about partially occluded polys and as such the Kyro's not rendering all the excess, non-visible info to the display memory like the GeForce and Radeon do.
Jury's still out on this design, but it looks promising to say the least. There's several developers trying to sweet-talk STMicroelectronics or Imagination out of register info to make Linux drivers right now because of the potential of the cards.
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
Actually, Videologic (I believe they were independant at this time. NEC invested in them first, then bought them outright) was going to implement a multi-way architecture using these things, but due to the DreamCast thing, never really made. As I remember it, the article said that it would have been VERY easy to do multiple chips for this, and they chastised 3DFx for taking such an inelegant approach when designing their own multichip solution (Voodoo2)
A deep unwavering belief is a sure sign you're missing something...
Is is just me, or does the name Hercules bring you back to the days of CGA?
Have they really been making cards under that brand all this time?
yeah, a Dual Head G400 ... nasty stuff. Absolutely *horrible* OpenGL implementation, at 1600x1200x32 it was running 4fps. I promptly returned the card to the OEM that was offering it to my employer as a demo and declined. There was just no reason to put up with that... DirectX was *lightening* at the time, I mean, they put some REALLY GOOD work into the DirectX implementation, but the OpenGL stuff was just *vastly* inferior.
---
Video meliora proboque deteriora sequor - Ovidius
Actually, NVIDIA's cards are significantly faster at only $40 or so price margin. The price/performance should actually be better for a 64MB GTS than a 64MB Radeon, but it depends on the benchmark and resolution.
A deep unwavering belief is a sure sign you're missing something...
Let's see 17million transistors. 4 of them would be around 68million transistors. Given that a GeForce 3 has 57 million transistors, and there would probably be some overlap between the chips, it seems that this would be quite doable. Of course, at that point you'd probably need DDR memery to feed the 4 chips, but hey, RAM is cheap! (As in 1 GB (4DIMMS) for $180 at pricewatch!)
A deep unwavering belief is a sure sign you're missing something...
3dfx (not the 3DFX we loved, but 3dfx) never used tile-rendering. They didn't support T&L because they were an opportunistic crappy company that screwed its users, tried to maintain a monopoly on Glide games (and used tricks such that if MS had used them, /. would be collectively frothing at the mouth) and deserved the pissing on that it got from NVIDIA. Ever since the Voodoo2 didn't make the jump to 32bit color (and the TNT did) 3dfx was on the way down for not being a leader in technology.
A deep unwavering belief is a sure sign you're missing something...
And if you'd read the article, you'd see that this card does achives FSAA at a decent resolution with very good performance, and that the quality of the memory architecture is what really makes it compare well, by massively reducing the amount of memory accesses.
If you'd read the article you'd have seen that they are releasing a lower power version based on the same architecture as well, and suggested a price of around $79 for it. But if you're looking for cheap machines for office applications, you should be looking at something with integrated chipsets instead. It's not like you'd normally put a 3D accelerated graphics card in a machine that is only intended for word processing or similar.
So you should still see a significant benefit - not as much as for opaque areas, though, as it can't just throw away the partially obscured pixels as it can with the totally hidden ones.
..I will wait for the obligatory Mr. Carmack response modded to +5. I'm hoping he's busy writing it now :)
Praying for the end of your wide-awake nightmare.
On the flip side of this, could the tile-based rendering be implemented for the very lowest segment of the video card market: PCI cards for legacy desktops? Wouldn't the tile-based rendering at least partially minimize the performance hit from using PCI as opposed to AGP.
I'd like to find an inexpensive PCI card to replace the 2MB Mystique in my old PPro200... I guess their wouldn't be much of a profit margin, however.
Waltz, nymph, for quick jigs vex Bud.
Nah - people have designed graphics chips that hit 'perfect' fill rates before - I know I did one (for the Mac 7-8 years back) that hit 1.2Gb/sec into VRAM (then state of the art DRAM) exactly as it was designed to.
Graphics chips have a relativly long history that is at least in part driven by the comodity memory technologies they have available to them. These days we're particularly troubled - system costs are going down, DRAM speeds haven't kept pace with CPU/GPU speed increases (CPUs have maybe gone from 100MHz to 1GHz in the time that memory has gone from 66MHz to 266MHz [transfer rate - latencies have only halved]).
'Tricks' like ISS (aka tiled frame buffers) work because they basicly cache the problem - at the expense of keeping an ordered polygon list (which means that you are more sensitive to scene complexity - too many more polys than pixels and you might be in big trouble) and latency (because you can't finish the poly sort stage before you start rendering - so you have to render a complete screen at once - while maybe buffering the next scene's polys in parallel) - note I'm over simplifying the problems here to explain some of the issues - there's lots of scope for smart people to do smart things in a space like this (before all the patents are granted - then without competition inovation will probably cease :-( )
Since tile based rendering eliminates overdraw, the effective fill rate of a tile based renderer can actually surpass the effective fill rate.
Wow! They can make the effective fill rate surpass the effective fill rate?! Maybe they can make my bank account balance surpass my bank account balance!
Disconnect your television. Do your own research. Draw your own conclusions. They're probably lying. Don't be a sheep.
I would LOVE to see this with T&L and Highbandwidth memory. If they can do well with these and fund further development to get a DDR version with T&L we might have some competition for the GF3 next year.
Of course The Carmack has spoken and does not agree with Tile Based rendering right now, at it's core it is kind of a kludge.. hrm..
I wonder what he thinks of that Anandtech article.
Oh great and powerful Carmack, we ask that you can grace us with your knowledge and wisdom in this time of confusion and shed light on the validity of tile based rendering. Hear us!
Sigs are awesome huh?
This is an instance of the old ATM vs IP or CISC vs RISC debates. It's the old engineering tradeoff: work smart but slow or work quick and dirty. Tile based rending is an instance of smart and slow, ie they do no more work than they have to, and thus get away with slower clocks and memory. The NVIDIA card is quick and dirty.
Historically, it is almost always the case that quick and dirty is the cheaper way to go, as it allows economies of scale to come into play. However, it is seeming more and more like the memory bandwidth bottleneck is here to stay, so the smart and slow approach is looking pretty good. Likewise as we run into physical limitations for network bandwidth, IP is going to have a harder and harder time to provide acceptable QoS and multicast solutions and ATM-like technologies will start becoming more prevalent.
The benchmarks show 350M pixels/s rendered on a 175MHz chip with two pipelines. I don't think anyone in the PC graphics industry has ever accomplished that. (I believe the VooDoo and other really early cards were held back by time to set up all the polys on the CPU)
Second, the point stands that this is quite new to the scene and that more bandwidth won't help.
BTW, thanks for the info.
According to the anandtech benchmarks:
QIII Arena 1024x768 @32bpp
GeForce2 GTS 64MB: 95.6fps
Radeon DDR 64MB: 80.6fps
That's a quite significant 15fps.
Q3 at 16x12 is unplayable on everything except the Ultra, but the GTS2 still wins.
MDK 1024x768 @32bpp
GeForce2 GTS 64MB: 105.9fps
Radeon DDR 64MB: 86.8fps
Again, about 18 more fps at this res.
MDK 1600x1200 @32bpp
GeForce2 GTS 64MB 43.3fps
Radeon DDR 64MB: 38.2fps
Only 5fps faster, but that's around 12% faster.
Unreal Tournament 1024x768 @32bpp (avg)
GeForce2 GTS 64MB: 84.5fps
Radeon DDR 64MB: 87.8fps.
Here the DDR wins, but only by 3fps.
Unreal Tournament 1600x1200 @32bpp (min)
GeForce2 GTS 64MB: 34.3fps
Radeon DDR 64MB: 18.8fps
Ouch. What were you saying about high resolutions?
The GTS is playable, the Radeon is not.
Unreal Tournament 1600x1200 @32bpp (avg)
GeForce2 GTS 64MB: 68.9fps
Radeon DDR 64MB: 56.9fps
The GTS is 12fps faster here.
Serious Sam 1024x768 @32bpp
GeForce2 GTS 64MB: 47.2fps
Radeon DDR 64MB: 50.1fps
The Radeon wins, but its only 3fps faster.
Serious Sam 1600x1200 @32bpp
GeForce2 GTS 64MB: 22.5fps
Radeon DDR 64MB: 24.7fps
A hair over 2fps faster.
Mercedes-Benz 1600x1200 @32bpp
GeForce2 GTS 64MB: 20.9fps
Radeon DDR 64MB: 24.2fps
The only decisive victory for the Radeon. Still, at the only playable resolution (640x480) the GTS wins 64.7 to 57.8.
So overall, the Radeon is a good card, but NVIDIA still has a significant speed advantage, and for only a little bit more, is worth it, in my opinion. (Not the mention the fact that they have better drivers and pro-caliber OpenGL!)
A deep unwavering belief is a sure sign you're missing something...
Interesting. See the article kept saying "It's great value for the price--sometimes it even beats a GF Ultra". No one said it was superior to the most expensive consumer 3d hardware. . .
Space invaders *SO* fast on this card, like 23000 FPS
--
Je t'aime Stéphanie
By the way, did you know you can use the Dreamcast Broadband Adapter to connect to your PC for some do-it-yourself development? Very cool...
If the poster had read the benchmarks, it would be obvious that the case is not so cut and dry. The card wins at some things, loses at others. It loses to the GF2GTS in some benchmarks, and beats the GF2 Ultra in others. A very cool card, and worlds beyond anything in its price range, however. This should do very good things to the low price range performance market as a whole, by pushing down other prices and by providing a cool new technology.
I realize that the Kyro offers a very good price/performance ratio, but why don't they offer a model (for a higher price, obviously) that had higher memory clocks? This way, those who wanted to pay for more performance could do so, and they could continue to sell their current cards at their competitive price.
Isnt't this why the GeForce 2 Ultras even exist? Some people always want the fastest cards, and are willing to pay premiums to be on the bleeding edge... my guess is that the "bleeding edgers" will reap a higher percentage profit on each unit...
This actually sounds pretty damn cool, and with a little luck will provide some nice compatition for nVidia. Since 3Dfx went bye-bye, I have been a little worried that nVidia would be the only real gaming card supplier(well, I guess that depends on if you count ATi)
"Useless organic meatbag" -HK-47
My next upgrade will be the video card. I've been intersted in AA as soon as I heard it was available on a video card. If you check out the article, this new card has better AA performance than the Geforce 2 Ultra.
Very intersting.
Good thing I have to wait a few months anyways.
Later
ErikZ
Democrats or Republicans. They are both taking us to the same place and they are not afraid of us anymore.
Tile-based rendering's big benefit it that is reduces overdraw to 0; that is, each opaque pixel on the screen is drawn exactly once. Performance for certain types of scenes is spectacular.
Dreamcast uses this, as well as many of Sega's arcade systems (HOTD2, for instance), which use the same PowerVR2 rendering system.
Where tile-based rendering falls down, however, is for scenes that contain a large amount of alpha-blended areas. Alpha-blended areas in today's hardware are necessarily drawn multiple times, from back-to-front, to accomplish transparency effects. Having to draw the pixel several times nullifies the zero-overdraw benefit of tile rendering. Since most tile-rendering systems trade fill-rate for zero overdraw, cards with insufficient fill rate for large alpha areas (read: all of them) fall down on large, alpha blended polygons. You can see this in House of the Dead 2 when fighting the Hierophant; if you get enough water splash effects on the screen, the frame rate chokes.
Tile rendering works extremely well for areas that are opaque, or use only small alpha-blended areas. It's getting better; it's just not perfect yet.
Mumbly Joe
While it may be that the PowerVR2 did not implement it correctly, there is nothing that prevents performance much better than immediate mode style rasterizers. Consider it this way:
A game needs to draw 5 opaque polygons, with 3 alpha polygons on top.
An immediate mode rasterizer would have to write all five polygons to memory, including all of the associated texture lookups and lighting calculations. Then, for each alpha polygon, it would have to reread bits from the framebuffer and combine it with the shaded textured alpha polygon. This is a lot of memory traffic.
A tile based renderer, otoh, would not need to do all of this. Obviously it would be able to eliminate all of the overdraw on the opaque polygons, but it would also be able to do the blending in the ON CHIP 24bit tile framebuffer, which is much much much faster than going to off chip memory. This means that instead of having to do read-modify-write off chip memory cycles for each of those alpha blended polygons, it stays on chip.
Now like I said before, I am not familiar with the PowerVR2 chip, and it may be that they do not implement this obvious optimization... I would assume their newer chip would.
My big question is "why not a T&L unit?" It seems like a sever handicap to an otherwise stellar chip. Although somewhat addressed in the article, they didn't really justify it well, and the benchmarks prove it would be handy. Maybe the 175mhz clock is what prevents an effective T&L unit from being added...
-Chris
Here's how it works:
Anyway, because the system uses ZERO memory bandwidth for Z-buffer calculations, the system is far more efficient, even though it is essentially traversing the scene dozens of times for each frame.
This is why the Sega Dreamcast is often able to have better performance than the Playstation 2.
Cryptnotic
My other first post is car post.
There is a good article on it, as applied to the powervr (which is using the same kind of architecture) at http://www.ping.be/powervr/PVRSGRendMain.htm. As others already said, you can see the results on the Dreamcast, or on the arcade version, the Naomi.
The strenghts are obvious:
The weaknesses are a little less obvious:
As a result, these cards are nice, but mostly represent another set of tradeoffs, not necessarily a revolution.
OG.
If I remember correctly GigaPixel's architecture was also Tile based, and I believe they had spent quite some time trying to head off the known issues with Tile architectures (though I honestly don't know how successful they were - the demos I saw were a while ago and looked good but things have changed since then).
Of course GigaPixel was acquired by 3dfx for approx. 300 Million US$ after initially winning the XBox graphics contract and then having it pulled from beneath them. And of course 3dfx was in turn acquired (though for only 150-160 Million US$ ?) by nVidia. So if Tile based rendering has a future (and Gigapixels is good) perhaps we can expect to see it from nVidia too before long.
Erm.. the whole point of this is that it doesnt *need* DDR. Adding DDR to it would *not* increase its performance what-so-ever.
... who knows.
That said, adding four of these inline and jumping to DDR would be decidedly sweet. The chips are fairly small, which would facilitate this, but I'm not sure if they are capable of that... since they just work on tiles, I cant see why you couldnt assign each a section of the scene but
It will be quite a while before hardware T&L comes out on these, I think, considering that this iteration is only just being released.
---
Video meliora proboque deteriora sequor - Ovidius
If you want to find out what is amazing about this card, read on: This card is based on NEC's powerVR architecture, and is really nothing more than the PowerVR2 clocked up to 175 mhz. What's funny is, I remember getting excited about this card over 3 years ago!! If you want to do more research on the architecture, dig up some old articles on Tom's hardware, where he benches it with quake1. At the time, the card was supposed to clean up the market, and it was going to debut at 125 mhz core/memory speed. (This was at the time when the voodoo1 was the standard, and the voodoo2 had just entered the scene, I remember holding out for this card, and simply settled on a TNT when I found out that NEC decided to drop out of the PC market). Then NEC made a deal with Sega, and put the chip in the dreamcast. What's even more amazing about the chip, is that ST simply had to change the clock to 175 mhz to make it competitive with nvidia's gefore2 ultra. What I think will be scary, is when they revamp this 4 year old chip design, and add T & L. Imagine what a chip like this could do with DDR RAM instead of SDRAM. This current chip only supports SDRAM, which is why they didn't put DDR RAM on the card. I think nvidia has their work cut out for them. Hopefully they will be able to license tile based rendering for their next card. I was really hoping that they would put it in the geforce 3, it would have made quite a bit greater difference than a crossbar memory architcture.
This design is very similar (if not the same) as the NEC's PowerVR and PowerVR2 chipsets.
That's because the Kyro/Kyro II use the PowerVR3 architecture. NEC used to partner with Imagination to produce those older chips.
-----
#o#
#o#
O Moo.
The article talks about the Windows drivers (complaining a bit about them - I assume they're still in development though). It does mention openGL support in the windows drivers...
Does anyone know if there will be DRI support for this chipset any time soon? One of these days I'll have to upgrade from my old Voodoo Banshee card...
---
"They have strategic air commands, nuclear submarines, and John Wayne. We have this"
Hacker Public Radio is our Friend
It sounds like a nice card and yadda yadda. Are they going to supply X drivers with GLX support? Otherwise the card isn't worth buying.