Tile Based Rendering and Accelerated 3D
ChickenHead writes "AnandTech has put together a review
of the Hercules 3D Prophet 4500 based on the new Kyro II chip from STMicro.
What's unique about this particular chip is that it uses a Tile-based Rendering
Architecture which results in a much greater rendering efficiency than conventional
3D rendering techniques. It is so efficient in fact, that the $149 Kyro II card
clocked at 175MHz is able to outperform a GeForce2 Ultra with considerably more
power and around 3X the cost of the Kyro II card. With games not able to take
advantage of the recently announced GeForce3's feature set, the Kyro II may
be a cheap solution to tide you over until the programmable GeForce3 GPU becomes
a necessity." A very readable and interesting summary and an interesting technology and a potentially extremely cool video card.
increasing efficiency? shmuck! go read the article!!
The Windows drivers for the G200 improved dramtically - you can even play Half-Life (Counter-Strike) with a Matrox G200 (system specs: celeron 366@550mhz, 128mb, G200 AGP). Unfortunately they fixed their OpenGL drivers *WAY* too late as competition had passed them by (on the G200 at least). 4 fps sounds horrible - was this a G400?
If there is going to be support it is pretty much always release date + ~6 months. Developers have to get the card, figure out how to write a driver for it (if the company want release specs), and then write the driver... Unless this company is linux friendly don't expect a driver right away.
Well, for a price-conciencious envornment, you can still pick up some old 2nd geration cards (such as the Voodoo 2's and 3's) for $20-80. They still run Quake (even Q2 with some tweaking) just fine, and for general computing they put out a good resolution at a good refresh rate. I picked up an old PCI V3 2000 for $40 at a computer show (which is more than i needed for a little server), and you can probably still get them at that price (if not lower) in quantity (even though 3dfx is out of buisness, there is no shortage of them).
--
--
#nohup cat
There is a serious problem with the memory bandwidth of current cards, but embedded memory promises to alleviate this situation.
I don't think it's a hard wall by any means.
er, sorry, "won't release specs" not "want.."
As we all know, lack of competition always leaves the consumers at a disadvantage. While this card won't be a hit among the Geforce3 target group, it could seriously cut into nVidias market, along with the Radeon. And while tile rendering has some strengths and some weaknesses, who is to say who'll run into the biggest problems... I doubt RAM, even DDR SDRAM will go all that much faster, so if they could create a tile rendering chip that needed the current bandwidth, it could really be something.. Might Kyro be to Geforce what AMD is to Intel? Time will tell..
Live today, because you never know what tomorrow brings
The PowerVR2 chips empirically choke on large transparent textures (House of the Dead 2 on the Naomi arcade hardware, which is PowerVR2-based, is a good example), so you can draw your own conclusions as to whether or not they implemented that optimization.
Very nice. I was dreading the cost of the Geforce3, but kind of resigned myself to buying it.
This sort of thing could really scare nVidia if it takes off; it'd be interesting to see if they come out with a Geforce3 Lite, or something, in order to compete with it.
I've been using an old G200 for QuakeIII and Half-Life. 45fps with little tweaking--and on a PII-350!
It's not as simple as that. You can have partially overlapped polygons amongst other things. Totally occluded polys can be culled without overdraw- partially occluded ones need some sort of clipping/culling done in one way or another to render right (Or you end up with gaps in the objects, etc.). Usually what is applied is a "painter's algorithm" which determines which order in space the polys are and paints them in order on the screen. That translates into overdraw. Some engines strive to minimize overdraw (such as Quake III) and others (such as Serious Sam) don't, letting the card deal with the problem. This is why you see such a disparity with Serious Sam logging such high scores for the Kyro II and the Kyro weighing in as a mid-range card- Croteam's apparently not concerning themselves as much about partially occluded polys and as such the Kyro's not rendering all the excess, non-visible info to the display memory like the GeForce and Radeon do.
Jury's still out on this design, but it looks promising to say the least. There's several developers trying to sweet-talk STMicroelectronics or Imagination out of register info to make Linux drivers right now because of the potential of the cards.
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
Actually, Videologic (I believe they were independant at this time. NEC invested in them first, then bought them outright) was going to implement a multi-way architecture using these things, but due to the DreamCast thing, never really made. As I remember it, the article said that it would have been VERY easy to do multiple chips for this, and they chastised 3DFx for taking such an inelegant approach when designing their own multichip solution (Voodoo2)
A deep unwavering belief is a sure sign you're missing something...
Is is just me, or does the name Hercules bring you back to the days of CGA?
Have they really been making cards under that brand all this time?
yeah, a Dual Head G400 ... nasty stuff. Absolutely *horrible* OpenGL implementation, at 1600x1200x32 it was running 4fps. I promptly returned the card to the OEM that was offering it to my employer as a demo and declined. There was just no reason to put up with that... DirectX was *lightening* at the time, I mean, they put some REALLY GOOD work into the DirectX implementation, but the OpenGL stuff was just *vastly* inferior.
---
Video meliora proboque deteriora sequor - Ovidius
Actually, NVIDIA's cards are significantly faster at only $40 or so price margin. The price/performance should actually be better for a 64MB GTS than a 64MB Radeon, but it depends on the benchmark and resolution.
A deep unwavering belief is a sure sign you're missing something...
Let's see 17million transistors. 4 of them would be around 68million transistors. Given that a GeForce 3 has 57 million transistors, and there would probably be some overlap between the chips, it seems that this would be quite doable. Of course, at that point you'd probably need DDR memery to feed the 4 chips, but hey, RAM is cheap! (As in 1 GB (4DIMMS) for $180 at pricewatch!)
A deep unwavering belief is a sure sign you're missing something...
Not necessarily... The Kyro II retails for $149.99; I just bought an eVGA GeForce2 GTS Pro from a local wholesaler for under $170 and I've seen Radeon DDR cards for as low as $130 locally.
The Radeon performs slightly worse than the Kyro II, the GF2-Pro slightly better. Thus, I'd say that the Kyro II is right in line with other cards in its price range.
STOP . AMERICA . NOW
3dfx (not the 3DFX we loved, but 3dfx) never used tile-rendering. They didn't support T&L because they were an opportunistic crappy company that screwed its users, tried to maintain a monopoly on Glide games (and used tricks such that if MS had used them, /. would be collectively frothing at the mouth) and deserved the pissing on that it got from NVIDIA. Ever since the Voodoo2 didn't make the jump to 32bit color (and the TNT did) 3dfx was on the way down for not being a leader in technology.
A deep unwavering belief is a sure sign you're missing something...
I had the sinking sensation that it's simply doing things the way old software renderers used to do it (especially the good old demos by Future Crew and friends).
I had the exact same feeling while reading through AnandTech's write-up.
And seeing such a huge performance difference between Q3 and Serious Sam, I wondered if it didn't come from the fact that Serious Sam's 3D engine "wasts" more bandwidth. Which would again explain the huge difference in the Fill Rate measured with Serious Sam.
While Quake3's engine would be more effective, sending less hidden polys to the card.
I remembered the days I fooled around making a couple of Q2 levels that, that a well designed level (mostly iD's) were very optimised in a bsp-tree kind of way (I sure hope what I'm saying makes any sens, because I'm really far from a 3D guru).
I for one would sure LOVE to here from John Carmack's point of view on such a technic, as he is probably the most thourough graphic card analyst I've ever read. And his points are from the other side of the fence, on the cosumer side.
Murphy(c)
How did this post get graded so highly? It is so full of mistakes!
Saying that this is just a "4 year old architecture" simply because PowerVR has been implementing tile based rendering for some number of years would be like saying that the Geforce3 is nothing more than an overclocked TNT!
The Kyro's (i.e. series 3 PowerVR chips) contain many new features, and so can't be considered to be "sped up" versions of their parents.
Simon
insert standard employee disclainer
Hmm, let's just look at the benchmarks, shall we? Lines in bold, with ***'s on them are the ones that the Kyro II came top in. In all other benchmarks, the winner was the GeForce2 Ultra.
Quake III Arena Performance
'Normal' Settings - 640x480x32
'Normal' Settings - 1024x768x32
'Normal' Settings - 1600x1200x32
MDK2 Performance
Default Settings (T&L enabled) - 640x480x32
Default Settings (T&L enabled) - 1024x768x32
Default Settings (T&L enabled) - 1600x1200x32
UnrealTournament Performance
Minimum Frame Rate - 640x480x32 ***
Average Frame Rate - 640x480x32 ***
Minimum Frame Rate - 1024x768x32
Average Frame Rate - 1024x768x32
Minimum Frame Rate - 1600x1200x16
Average Frame Rate - 1600x1200x16
Serious Sam Performance - Fill Rates
Serious Sam Test 2 Single Texture Fillrate
Serious Sam Test 2 Multitexture Fillrate
Serious Sam Performance - Game Play
Serious Sam Test 2 640x480x32
Serious Sam Test 2 1024x768x32 ***
Serious Sam Test 2 1600x1200x32 ***
Mercedes-Benz Truck Racing Performance
All options enabled - 640x480x32
All options enabled - 1024x768x32
All options enabled - 1600x1200x32
FSAA Image Quality and Performance
Serious Sam Test 2 640x480x32 (4 Sample FSAA) ***
Serious Sam Test 2 1024x768x32 (4 Sample FSAA) ***
You can draw your own conclusions, but I think I'll keep saving for that GeForce.
"Unique" presumably means that no-one other than the PowerVR series, which this new card, the Dreamcast's chip and the original PowerVR card all belong to, are the only ones to go down the tile-based route.
This is because its a phenomenally quick render method when designed for, but
(a) it takes a big hit to do stuff the way every other 3d card on the market does things (and guess which method is going to get used by a developer writing for a platform where either might be in place), and
(b) if you are used to doing things the 'normal' way its a pain in the rear to try and re-jig your code into a tile-based format. You might as well rewrite the engine from the ground up.
Of course, if (as with the Dreamcast) you're writing explicitly for a tile-based platform then it kicks arse for the money.
"I Know You Are But What Am I?"
And if you'd read the article, you'd see that this card does achives FSAA at a decent resolution with very good performance, and that the quality of the memory architecture is what really makes it compare well, by massively reducing the amount of memory accesses.
If you'd read the article you'd have seen that they are releasing a lower power version based on the same architecture as well, and suggested a price of around $79 for it. But if you're looking for cheap machines for office applications, you should be looking at something with integrated chipsets instead. It's not like you'd normally put a 3D accelerated graphics card in a machine that is only intended for word processing or similar.
So you should still see a significant benefit - not as much as for opaque areas, though, as it can't just throw away the partially obscured pixels as it can with the totally hidden ones.
--
...or am I missing something?
This is the Vooodoo 3 2000, right? I thought the Voodoo what highly dependant on the CPU speed. This will be going into an old PPro 200. I will look into it, thanks.
Waltz, nymph, for quick jigs vex Bud.
..I will wait for the obligatory Mr. Carmack response modded to +5. I'm hoping he's busy writing it now :)
Praying for the end of your wide-awake nightmare.
On the flip side of this, could the tile-based rendering be implemented for the very lowest segment of the video card market: PCI cards for legacy desktops? Wouldn't the tile-based rendering at least partially minimize the performance hit from using PCI as opposed to AGP.
I'd like to find an inexpensive PCI card to replace the 2MB Mystique in my old PPro200... I guess their wouldn't be much of a profit margin, however.
Waltz, nymph, for quick jigs vex Bud.
This design is very similar (if not the same) as the NEC's PowerVR and PowerVR2 chipsets.
Kyro IS a PowerVR chip. Read before you comment.
Nah - people have designed graphics chips that hit 'perfect' fill rates before - I know I did one (for the Mac 7-8 years back) that hit 1.2Gb/sec into VRAM (then state of the art DRAM) exactly as it was designed to.
Graphics chips have a relativly long history that is at least in part driven by the comodity memory technologies they have available to them. These days we're particularly troubled - system costs are going down, DRAM speeds haven't kept pace with CPU/GPU speed increases (CPUs have maybe gone from 100MHz to 1GHz in the time that memory has gone from 66MHz to 266MHz [transfer rate - latencies have only halved]).
'Tricks' like ISS (aka tiled frame buffers) work because they basicly cache the problem - at the expense of keeping an ordered polygon list (which means that you are more sensitive to scene complexity - too many more polys than pixels and you might be in big trouble) and latency (because you can't finish the poly sort stage before you start rendering - so you have to render a complete screen at once - while maybe buffering the next scene's polys in parallel) - note I'm over simplifying the problems here to explain some of the issues - there's lots of scope for smart people to do smart things in a space like this (before all the patents are granted - then without competition inovation will probably cease :-( )
Since tile based rendering eliminates overdraw, the effective fill rate of a tile based renderer can actually surpass the effective fill rate.
Wow! They can make the effective fill rate surpass the effective fill rate?! Maybe they can make my bank account balance surpass my bank account balance!
Disconnect your television. Do your own research. Draw your own conclusions. They're probably lying. Don't be a sheep.
hehe, dude...slow down a bit. Before they get X drivers going, they first needto figure out all the bugs for windows drivers. I read a review over at Toms about a tile based renderer, and they tend to have issues drawing some things due to programming style, etc. (someone said something about zbuffer, etc)
I would LOVE to see this with T&L and Highbandwidth memory. If they can do well with these and fund further development to get a DDR version with T&L we might have some competition for the GF3 next year.
Of course The Carmack has spoken and does not agree with Tile Based rendering right now, at it's core it is kind of a kludge.. hrm..
I wonder what he thinks of that Anandtech article.
Oh great and powerful Carmack, we ask that you can grace us with your knowledge and wisdom in this time of confusion and shed light on the validity of tile based rendering. Hear us!
Sigs are awesome huh?
This is an instance of the old ATM vs IP or CISC vs RISC debates. It's the old engineering tradeoff: work smart but slow or work quick and dirty. Tile based rending is an instance of smart and slow, ie they do no more work than they have to, and thus get away with slower clocks and memory. The NVIDIA card is quick and dirty.
Historically, it is almost always the case that quick and dirty is the cheaper way to go, as it allows economies of scale to come into play. However, it is seeming more and more like the memory bandwidth bottleneck is here to stay, so the smart and slow approach is looking pretty good. Likewise as we run into physical limitations for network bandwidth, IP is going to have a harder and harder time to provide acceptable QoS and multicast solutions and ATM-like technologies will start becoming more prevalent.
Does anyone actually still buy complete PC's?
I mean you get ripped off, with non upgradeable junk unless you build it yourself.
And unless you build it yourself, when it breaks you usually have to take it somewhere to fix it.
Build it yourself, buy the cards, mb, cpu, ram, hd. and enjoy.
The benchmarks show 350M pixels/s rendered on a 175MHz chip with two pipelines. I don't think anyone in the PC graphics industry has ever accomplished that. (I believe the VooDoo and other really early cards were held back by time to set up all the polys on the CPU)
Second, the point stands that this is quite new to the scene and that more bandwidth won't help.
BTW, thanks for the info.
Try a V3-2000. SHould be dirt cheap these days, and the PCI version is as fast (er, slow?) as the AGP version...
(I get a solid 60fps in UT, on a Duron 750 machine)
--
--
Don't like it? Respond with words, not karma.
Your forgetting that this card does per-pixel sorting so all your alpha effects will work correctly, which is non trivial to achieve on traditional architectures. It's much more fun to work with alpha blending on a PowerVR than on a 3Dfx/nVidia/Matrox/PS2. But some multipass effects are harder to achieve since you do not have full control on the order in which your triangles are rendered, remember multi-pass != alpha blending.
This is an impressive card, no matter how you look at it. It makes me wonder, what would occur if this came into play in the laptop/integrated market. The card is obviously cheap, when you look at the Geforce2go which is basically a Geforce 2MX with a lower power consumption, it is in the same price bracket. It also only dissipates 4 watts. For a card this powerful, cheap and cool, why isn't anyone thinking of these markets to push the chip?
WikiAfterDark.com It's a sex wiki, go now!
"Anyone that has ever gotten an idea based on any of my work and done something better with it-good for you."--J.Carmack
You've got to be shitting me. ATI? They are the lamest, nonexistant driver, non-3d 3d card manufacturer. I just wasted a weekend trying to get Half-life running under W2k. Their OpenGL implementation is so poor the game performs best in SOFTWARE RENDERING.
An ex-collegue of mine left to go work in ST Micro's drivers department about a year ago. I looked him up not long after when the original Kyro was released; at the time, he was doing driver development for the KyroII. He mentioned then that ST Micro were working on a Kyro-with-T&L part, but didn't mention any ship dates. I did get the impression that is wasn't going to be that far behind the KyroII though. So, we might have a T&L enabled card sooner rather than later, which will be pretty sweet. In the meantime, I think the KyroII will be the perfect stopgap between my Geforce 1 DDR, which is starting to look a little long in the tooth, and a Geforce 3, which will be an almighty card once DirectX 8 has some proper software support.
You win again, gravity!
According to the anandtech benchmarks:
QIII Arena 1024x768 @32bpp
GeForce2 GTS 64MB: 95.6fps
Radeon DDR 64MB: 80.6fps
That's a quite significant 15fps.
Q3 at 16x12 is unplayable on everything except the Ultra, but the GTS2 still wins.
MDK 1024x768 @32bpp
GeForce2 GTS 64MB: 105.9fps
Radeon DDR 64MB: 86.8fps
Again, about 18 more fps at this res.
MDK 1600x1200 @32bpp
GeForce2 GTS 64MB 43.3fps
Radeon DDR 64MB: 38.2fps
Only 5fps faster, but that's around 12% faster.
Unreal Tournament 1024x768 @32bpp (avg)
GeForce2 GTS 64MB: 84.5fps
Radeon DDR 64MB: 87.8fps.
Here the DDR wins, but only by 3fps.
Unreal Tournament 1600x1200 @32bpp (min)
GeForce2 GTS 64MB: 34.3fps
Radeon DDR 64MB: 18.8fps
Ouch. What were you saying about high resolutions?
The GTS is playable, the Radeon is not.
Unreal Tournament 1600x1200 @32bpp (avg)
GeForce2 GTS 64MB: 68.9fps
Radeon DDR 64MB: 56.9fps
The GTS is 12fps faster here.
Serious Sam 1024x768 @32bpp
GeForce2 GTS 64MB: 47.2fps
Radeon DDR 64MB: 50.1fps
The Radeon wins, but its only 3fps faster.
Serious Sam 1600x1200 @32bpp
GeForce2 GTS 64MB: 22.5fps
Radeon DDR 64MB: 24.7fps
A hair over 2fps faster.
Mercedes-Benz 1600x1200 @32bpp
GeForce2 GTS 64MB: 20.9fps
Radeon DDR 64MB: 24.2fps
The only decisive victory for the Radeon. Still, at the only playable resolution (640x480) the GTS wins 64.7 to 57.8.
So overall, the Radeon is a good card, but NVIDIA still has a significant speed advantage, and for only a little bit more, is worth it, in my opinion. (Not the mention the fact that they have better drivers and pro-caliber OpenGL!)
A deep unwavering belief is a sure sign you're missing something...
Space invaders *SO* fast on this card, like 23000 FPS
--
Je t'aime Stéphanie
I'm looking forward to a version of this card with T&L on it. It managed to keep up in most tests...except the ones where T&L was actually used. Anything used in my DC is good in my comp too :)
Can you say Dreamcast? I knew that you could.
By the way, did you know you can use the Dreamcast Broadband Adapter to connect to your PC for some do-it-yourself development? Very cool...
There clock is much much faster then any clock I've seen today.
What is pirate software? Software for inventory of stolen treasure?
If the poster had read the benchmarks, it would be obvious that the case is not so cut and dry. The card wins at some things, loses at others. It loses to the GF2GTS in some benchmarks, and beats the GF2 Ultra in others. A very cool card, and worlds beyond anything in its price range, however. This should do very good things to the low price range performance market as a whole, by pushing down other prices and by providing a cool new technology.
I realize that the Kyro offers a very good price/performance ratio, but why don't they offer a model (for a higher price, obviously) that had higher memory clocks? This way, those who wanted to pay for more performance could do so, and they could continue to sell their current cards at their competitive price.
Isnt't this why the GeForce 2 Ultras even exist? Some people always want the fastest cards, and are willing to pay premiums to be on the bleeding edge... my guess is that the "bleeding edgers" will reap a higher percentage profit on each unit...
This actually sounds pretty damn cool, and with a little luck will provide some nice compatition for nVidia. Since 3Dfx went bye-bye, I have been a little worried that nVidia would be the only real gaming card supplier(well, I guess that depends on if you count ATi)
"Useless organic meatbag" -HK-47
My next upgrade will be the video card. I've been intersted in AA as soon as I heard it was available on a video card. If you check out the article, this new card has better AA performance than the Geforce 2 Ultra.
Very intersting.
Good thing I have to wait a few months anyways.
Later
ErikZ
Democrats or Republicans. They are both taking us to the same place and they are not afraid of us anymore.
Tile-based rendering's big benefit it that is reduces overdraw to 0; that is, each opaque pixel on the screen is drawn exactly once. Performance for certain types of scenes is spectacular.
Dreamcast uses this, as well as many of Sega's arcade systems (HOTD2, for instance), which use the same PowerVR2 rendering system.
Where tile-based rendering falls down, however, is for scenes that contain a large amount of alpha-blended areas. Alpha-blended areas in today's hardware are necessarily drawn multiple times, from back-to-front, to accomplish transparency effects. Having to draw the pixel several times nullifies the zero-overdraw benefit of tile rendering. Since most tile-rendering systems trade fill-rate for zero overdraw, cards with insufficient fill rate for large alpha areas (read: all of them) fall down on large, alpha blended polygons. You can see this in House of the Dead 2 when fighting the Hierophant; if you get enough water splash effects on the screen, the frame rate chokes.
Tile rendering works extremely well for areas that are opaque, or use only small alpha-blended areas. It's getting better; it's just not perfect yet.
Mumbly Joe
I know this is a shameless plug, but I spent all weekend working on ethernet, and I sent my friends a couple of e-mails via a telnet session (under a BusyBox filled initrd) from my Dreamcast :). But seriously, we need more kernel hackers in there so we can spit out more drivers....
Back on topic, the LinuxDC framebuffer writes from CPU RAM directly to PVR2 RAM, which is about as slow as you can get. I ran a simple SDL parallax scrolling example, and the results, were shall we say CRAP :). I've started thinking about how to accelerate the FB using the PVR2's Tile Accelerator, but I'm not that keen with its internals or how Tile-based redering would work (yet). If anyone there can point to some TA-based resources in general - there are a few good docs linked from julesdcdev, but I was thinking more general TA docs (e.g. not Dreamcast-specific).
We *need* interested developers, testers, and authors, to stop by LinuxDC (we're also in the process of restructuring our site), as we're finally starting to get the ball rolling...
M. R.
Not for the serious gamer perhaps, but this is just another card that is completely overpowered, and therefore overpriced, for development and office purproses.
Personally, I would like to see an emphasis on increasing any given video adapter's efficiency and decreasing its price before increasing its power.
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
While it may be that the PowerVR2 did not implement it correctly, there is nothing that prevents performance much better than immediate mode style rasterizers. Consider it this way:
A game needs to draw 5 opaque polygons, with 3 alpha polygons on top.
An immediate mode rasterizer would have to write all five polygons to memory, including all of the associated texture lookups and lighting calculations. Then, for each alpha polygon, it would have to reread bits from the framebuffer and combine it with the shaded textured alpha polygon. This is a lot of memory traffic.
A tile based renderer, otoh, would not need to do all of this. Obviously it would be able to eliminate all of the overdraw on the opaque polygons, but it would also be able to do the blending in the ON CHIP 24bit tile framebuffer, which is much much much faster than going to off chip memory. This means that instead of having to do read-modify-write off chip memory cycles for each of those alpha blended polygons, it stays on chip.
Now like I said before, I am not familiar with the PowerVR2 chip, and it may be that they do not implement this obvious optimization... I would assume their newer chip would.
My big question is "why not a T&L unit?" It seems like a sever handicap to an otherwise stellar chip. Although somewhat addressed in the article, they didn't really justify it well, and the benchmarks prove it would be handy. Maybe the 175mhz clock is what prevents an effective T&L unit from being added...
-Chris
All in all, this is bad news for ATI. They're losing their OEM business to nVidia not only in low cost PC's but in Macs as well. They decided to reinvent themselves with the Radeon's swank environmental bump-mapping and stuff, a high-end 2d card for graphic designers who fired up Quake on the office LAN after hours. This would (they hoped) put them in the #2 spot and help ATI move into the 3d gamer market. But looking at the benchmarks for the Kyro II, the new chip beats the DDR Radeon in several benchmarks, impressive considering the newcomer's lack of T&L rendering. Unless the Kyro has horrible image quality, I would guess ATI is not pleased.
* I realize that Power VR et al have been around for years making chips for consoles and arcade games. So was nVidia before the riva 128; I'm talking about entry into the PC graphics card market.
Today's sig brought to you by http://www.swankypimp.com
I understood that the goal of tile based rendering would be that the tiles would be able to be devided between multiple cpu's so the tiles could be rendered in parallel. OR is this just the future of tile based rendering? Graphics chip designers really have an advantage over cpu's, they can easilly provide enough registers on thier gpu's as well as very small instruction sets. Lucky bastards.
Spring is here. Don't believe me, look outside!
So what are you saying? Tile-based rendering is the work of Satan?
Check out
freenet:CHK@qANifG8baVSFWd-ZsW5kvFVjcwcOAwE,ZXRUsp PkxMFRzwRsJdrpqg
Got friends?
My understanding about the GeForce 3 is that any old game can take advantage of it out of the box., for example, actually being able to use FSAA at a decent resolution, and of course, faster frame rates through the use of massive quantities of transistors and a more eficient memory architecture.
Will this render porn more clearly?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~ the real world is much simpler ~~
--- -- - -
Give me LIBERTY, or give me a check.
--
...or am I missing something?
That's actually one of the big benefits of the PowerVR architecture. I believe there are some Sega arcade boards that use two or more PowerVR chips.
Here's how it works:
Anyway, because the system uses ZERO memory bandwidth for Z-buffer calculations, the system is far more efficient, even though it is essentially traversing the scene dozens of times for each frame.
This is why the Sega Dreamcast is often able to have better performance than the Playstation 2.
Cryptnotic
My other first post is car post.
There is a good article on it, as applied to the powervr (which is using the same kind of architecture) at http://www.ping.be/powervr/PVRSGRendMain.htm. As others already said, you can see the results on the Dreamcast, or on the arcade version, the Naomi.
The strenghts are obvious:
The weaknesses are a little less obvious:
As a result, these cards are nice, but mostly represent another set of tradeoffs, not necessarily a revolution.
OG.
Sounds like a neat low-end solution, but I'm always suspicious when the evangelists have to spread FUD like:
"Also included in the Kyro II is 8-layer multisampling that allows for up to 8 textures to be applied in a single pass. Other cards are forced to re-send triangle data for the scene being rendered when multitexturing, eating up precious memory bandwidth. Since the Kyro II features 8-layer multisampling, the chip can process the textures without having to re-send the triangle information."
Guys, if the chip is all that, let it stand out on its virtues alone. Your competition has been multitexturing since the Voodoo II.
And of course:
"Missing from the Kyro II feature set is a T&L engine. Claiming that the current generation of CPUs are far superior at T&L calculations than any graphics part can be, STMicroelectronics choose to leave T&L off the Kyro II."
I could sneeze at this point and mutter the appropriate profanity under my breath. However, I'd much rather see the chip succeed or fail because of its feature set, instead of the ability of Imagination/STMicroelectronics at slinging mud at the competition.
Those benchmarks are really interesting. It would be fantastic to have a successor to 3Dfx, if only to keep Nvidia and ATI on their toes. My chief worry towards their commercial acceptance would be how much of DirectX 8 do these guys support? It's not a fair worry, but I think it's a realistic one. I wish them the best of luck.
If I remember correctly GigaPixel's architecture was also Tile based, and I believe they had spent quite some time trying to head off the known issues with Tile architectures (though I honestly don't know how successful they were - the demos I saw were a while ago and looked good but things have changed since then).
Of course GigaPixel was acquired by 3dfx for approx. 300 Million US$ after initially winning the XBox graphics contract and then having it pulled from beneath them. And of course 3dfx was in turn acquired (though for only 150-160 Million US$ ?) by nVidia. So if Tile based rendering has a future (and Gigapixels is good) perhaps we can expect to see it from nVidia too before long.
Erm.. the whole point of this is that it doesnt *need* DDR. Adding DDR to it would *not* increase its performance what-so-ever.
... who knows.
That said, adding four of these inline and jumping to DDR would be decidedly sweet. The chips are fairly small, which would facilitate this, but I'm not sure if they are capable of that... since they just work on tiles, I cant see why you couldnt assign each a section of the scene but
It will be quite a while before hardware T&L comes out on these, I think, considering that this iteration is only just being released.
---
Video meliora proboque deteriora sequor - Ovidius
If you want to find out what is amazing about this card, read on: This card is based on NEC's powerVR architecture, and is really nothing more than the PowerVR2 clocked up to 175 mhz. What's funny is, I remember getting excited about this card over 3 years ago!! If you want to do more research on the architecture, dig up some old articles on Tom's hardware, where he benches it with quake1. At the time, the card was supposed to clean up the market, and it was going to debut at 125 mhz core/memory speed. (This was at the time when the voodoo1 was the standard, and the voodoo2 had just entered the scene, I remember holding out for this card, and simply settled on a TNT when I found out that NEC decided to drop out of the PC market). Then NEC made a deal with Sega, and put the chip in the dreamcast. What's even more amazing about the chip, is that ST simply had to change the clock to 175 mhz to make it competitive with nvidia's gefore2 ultra. What I think will be scary, is when they revamp this 4 year old chip design, and add T & L. Imagine what a chip like this could do with DDR RAM instead of SDRAM. This current chip only supports SDRAM, which is why they didn't put DDR RAM on the card. I think nvidia has their work cut out for them. Hopefully they will be able to license tile based rendering for their next card. I was really hoping that they would put it in the geforce 3, it would have made quite a bit greater difference than a crossbar memory architcture.
Is anyone actually still buying PC cards?
This design is very similar (if not the same) as the NEC's PowerVR and PowerVR2 chipsets.
That's because the Kyro/Kyro II use the PowerVR3 architecture. NEC used to partner with Imagination to produce those older chips.
-----
#o#
#o#
O Moo.
I read somewhere that DirectX-8 is going to further abstract the z-buffer out of the programmer's hands. The article from which I gleaned this tidbit was exceptionally poorly written and I've not backed this up with real reasearch. Perhaps this was a bit of forward-thinking on NEC's part?
While it may be that the PowerVR2 did not implement it correctly, there is nothing that prevents alpha blending performance much better than immediate mode style rasterizers. Consider it this way:
A game needs to draw 5 opaque polygons, with 3 alpha polygons on top.
An immediate mode rasterizer would have to write all five polygons to memory, including all of the associated texture lookups and lighting calculations. Then, for each alpha polygon, it would have to reread bits from the framebuffer and combine it with the shaded textured alpha polygon. This is a lot of memory traffic.
A tile based renderer, otoh, would not need to do all of this. Obviously it would be able to eliminate all of the overdraw on the opaque polygons, but it would also be able to do the blending in the ON CHIP 24bit tile framebuffer, which is much much much faster than going to off chip memory. This means that instead of having to do read-modify-write off chip memory cycles for each of those alpha blended polygons, it stays on chip.
Now like I said before, I am not familiar with the PowerVR2 chip, and it may be that they do not implement this obvious optimization... I would assume their newer chip would.
My big question is "why not a T&L unit?" It seems like a sever handicap to an otherwise stellar chip. Although somewhat addressed in the article, they didn't really justify it well, and the benchmarks prove it would be handy. Maybe the 175mhz clock is what prevents an effective T&L unit from being added...
-Chris
It's always amusing how people will COMMENT on a story without reading it. Maybe if it was open-source, slashdotters would be more interested, I dunno. I don't know why slashdot doesn't link you directly to the article before letting you post a comment or read comments... i.e. you guys coul djust open a frame or make a local cache of the article perhaps? Regardless, this thing is damn exciting because you get huge BANG for the buck. Its the only video card where you'll get more than you paid for.
Distinguish between tile based rendering in general and the PowerVR2 as a pioneering early example (in consumer space, old hat for flight sim visuals).
Cards that have less fill-rate are going to do worse on scenes with more depth complexity.
Sabre understands. Fill rate measures ability to write to a frame buffer. If you only write to each pixel exactly once, depth complexity doesn't come into it. Compared to a classic card, the overhead is the memory bandwidth associated with the bucket sorting. You can be clever and do some depth culling on the bucketed fragments to reduce their depth complexity (but this isn't too good as triangles decrease in size).
Before I even start, IANAHSGP (hot shot graphics programmer). While reading the article and learning how this chip does its thing, I had the sinking sensation that it's simply doing things the way old software renderers used to do it (especially the good old demos by Future Crew and friends).
The whole concept of wasteful polygon rendering comes from early hardware acceleration, which needed a simplified rendering scheme in its early days in order to produce hardware that is cheap enough to mass-market. They simply didn't have the means to put a pseudo-raytracer on a chip fast enough to play games, and the bottlenecks were elsewhere anyway (host CPU and bus).
Nevertheless, it's a smart deviation from the herd and sounds like good competition against Geforce MX (which is more like a TNT2 on steroids) as well as ATI's mid-range Fury and low-range Radeons that are gathering good popularity in the bargain performance market.
-Billco, Fnarg.com
I don't know if tile-based rendering actually delivers so much more performance, but for sure it is not new. NEC uses it for its PCX/PowerVR chipsets, which end up into Dreamcast consoles too. One thing tile-based rendering probalby delivers over other algorithms is antialiasing at low cost. Anyway, looks like STM is behind those chips as well, just check the links on their homepage (see article).
The article talks about the Windows drivers (complaining a bit about them - I assume they're still in development though). It does mention openGL support in the windows drivers...
Does anyone know if there will be DRI support for this chipset any time soon? One of these days I'll have to upgrade from my old Voodoo Banshee card...
---
"They have strategic air commands, nuclear submarines, and John Wayne. We have this"
Hacker Public Radio is our Friend
It sounds like a nice card and yadda yadda. Are they going to supply X drivers with GLX support? Otherwise the card isn't worth buying.