3D Raytracing Chip Shown at CeBIT

Hardware encoding by BWJones · 2005-03-14 16:12 · Score: 5, Interesting

FPGA is clocked at 90 MHz and is 3-5 times faster in raytracing then a Pentium4 CPU with 30 times more MHz.

I am really not surprised at the performance as most anytime you build code into hardware, it is significantly faster. For instance, I used to have a Radius 4 DSP Photoshop accelerator card in my old 68030 based Mac IIci I bought in 1990 that would run Photoshop filters significantly faster than even my much later PowerPC based PowerMac 8500 purchased in 1996 with faster hard drives and more memory.

The same sorts of benefits can be seen in vector math for optimizations that have been built into the G4 and G5 chips with Altivec.

So, the question is: Can these guys get ATI or nVidia to buy their chip?

--
Visit Jonesblog and say hello.

Re:Hardware encoding by moosesocks · 2005-03-14 16:47 · Score: 4, Interesting

unlikely. the current generation of 3d cards are all polygon-pushers. Direct3D/OpenGL are all about polygons. virtually all raytracing is done by the CPU.

Of course, raytracing produces beautiful results compared to the other methods of 3d graphics, but it is MUCH more expensive in terms of CPU cycles on today's CPUs and non-existant on graphics chips -- the first gfx chips were polygon-based because drawing polygons is indeed easier than raytracing even with specialized hardware. of course, specialized hardware definitely helps polygons as well. my 300mhz/whatever TNT2 can render a scene as fast as the fastest pentiums today can using software rendering.

all of the big renderfarms rely exclusively on the CPU to do their animations. this could change all that. I for one look forward to seeing the potential this has.

--
-- If you try to fail and succeed, which have you done? - Uli's moose
Re:Hardware encoding by mrgreen4242 · 2005-03-14 16:47 · Score: 4, Interesting

I think there may be a market here. Say, for example, that the next generation of Unreal or Doom engine is designed around something like this. The SOFTWARE vendor could potentially, assuming they could get the cost down far enough, offer some sort of PCI or even better USB2/FW hardware accelerator bundled WITH the game.

Think of it like this... Unreal 4 or whatever the next next gen will be decides to partner up with these guys. They develop an engine that runs at 60fps with amazing graphics, etc. You can buy the USB3 or FW1600 or whatever add-on needed for the game for, say, $50, or a bundle for $75 that has the addon and the game. The development cycle would be much easier as there is only one type of hardware to worry about, and the consumer would win as they could get the new hottness game without having to drop $300 on a new new video card.

It could also serve as an amazingly effective copy protection scheme. Can't very well play the game without the required accelerator.

Seems possible to me.
Re:Hardware encoding by Afrosheen · 2005-03-14 17:02 · Score: 2, Interesting

Those days are over for the average computer user at home, but it's very alive for corporate users. Alot of CAD and design software requires keys on the workstation AND the server.
Re:Hardware encoding by Anonymous Coward · 2005-03-14 22:00 · Score: 1, Interesting

Yeah, but a large, fast FPGA can easily top $10k.

Which FPGA costs $10k precisely? I'm no expert, but $10k sounds pretty crazy for an FPGA on this side of the 1980s.

Last I checked high end Xilinx FPGAs (virtex 2 pro) were going for under 50$USD in quanity, and that was a loooong time ago in terms of the FPGA market.

Prices have been dropping for these parts for some time.

These guys are quoting ~ 60$USD for the new Virtex 4 (beit the two low end models) in quanity.
http://www.cimdata.com/newsletter/2004/37/02/37.02 .21.htm

I saw someplace (FPGA Journal?) that some Xilinx Spartan 3 were going for something like 2$USD in quantity.

No doubt the present offering from Altera (in the Stratix 2, and the Cyclone) are competative.

Reconfigurable logic is fast becoming common place in consumer electronics. Previously you wouldn't dream of putting an FPGA in something you'd have to sell a million of, but now days that propersition is not that far out of reach. For some items it's already cost effective (set top boxes for example).

Actually if you do a little googling you'll find a couple of big names in the computing world researching and developing reconfigurable computing platforms (one as I recall, has an office in Redmond). I wouldn't be suprised if big names in PC graphics cards weren't already throwing money at it.
Re:Hardware encoding by RichardX · 2005-03-15 00:03 · Score: 2, Interesting

The thing with dongles is they're expensive, they're a pain in the arse for users, and they still don't stop a determined cracker.

At the end of the day, there comes a point where the software checks it's key/dongle/word from the manual/price of fishcakes in japan and asks itself "So, am I allowed to run?" and all you need to do is ensure the answer to that is always "Yes!"

--
Curiosity was framed. Ignorance killed the cat.
Re:Hardware encoding by orasio · 2005-03-15 00:18 · Score: 2, Interesting

unlikely. the current generation of 3d cards are all polygon-pushers. Direct3D/OpenGL are all about polygons. virtually all raytracing is done by the CPU.

Are not just polygon pushers since a long time now. You should watch Doom3, maybe.

For example, with nvidia shaders 2.0, in a nvidia 5900, there's some people who built a photon mapping kind-of-realtime renderer. I'm sure with the new cards, it could be actually realtime.

For your information, ray tracing is far, far from providing the best results, party because of its failure to represent complex diffuse-specular interactions (caustics being the most visible difference).

Photon mapping is veeeery slow, but comprises two phases, so it /could/ be provided half baked for games, with the 3d card performing the last phase.
Re:Hardware encoding by daVinci1980 · 2005-03-15 05:36 · Score: 2, Interesting

There's a few pieces of information here that you're missing, I figured I'd drop in my $0.02.

while overdraw is a major hurdle for traditional 3d cards.
Not lately (as of geForce5K+). There are two very tried-and-true methods of getting around this. One is to lay down a depth-only pass first, which allows the hardware to render at ~2x the normal polygon throughput (which is pretty much infinite at this point anyways), and then reject the hidden color pixels *before shading ever occurs*. Also, there is *a staggering* amount of area devoted to doing z rejection in the most efficient manner possible. A rough front-to-back sort of all non alpha-blended objects will generally result in very good performance in terms of z-complexity. Overdraw is very rarely a problem on modern hardware, or at least it's not a problem that developers cannot address.

Flexible and robust realistic reflection and refraction is solely in the domain of ray tracing
Only in your limited view of how to perform reflection / refraction / diffraction effects. There are actually quite effective techniques that allow someone to perform fairly accurate reflections and refractions through an arbitrary object at an arbitrary location. It's all about cube-map lookups and scene management.

Plus you're missing the very large problem with raytracing. Whereas typical 3-D rendering techniques are marginally dependent on viewport size (depending on their specific data set), raytracing is *completely* bound by the number of pixels that have to be displayed. 640x480 is half as bad as 800x600. 1600x1200 is four times worse then that. Anti-aliasing is *4 times* worse then that. Whereas going from 1024x768 to 1600x1200 with 4xAA in a traditional rendering platform is dependent on the data set, in terms of raytracing the problem is *automatically* 10 times harder on raytracing, EVEN IF YOUR SCENE IS EMPTY.

Yikes.

--
I currently have no clever signature witicism to add here.

oh yeah by lycium · 2005-03-14 16:19 · Score: 3, Interesting

ray tracing will *so* usher in a new era of realtime graphics when we can do something like 10-50m intersections per second.

it's amazing to me that nvidia have ignored this up until now, their existing simd architecture and memory subsystems can be easily adapted...

all we need now is consumer push!

Re:oh yeah by forkazoo · 2005-03-14 16:51 · Score: 4, Interesting

It's not amazing at all. When nVidia started making 3D accelerators, OpenGL was a mature, common API. Direct X was gaining traction. DCC and game programmers were familiar with the immediate mode API's, and were making programs that used them.

By making a card that rendered in immediate mode, nVidia had, ya know, a market. If they created a raytracing card, they would have needed to invent a new API to run it. They would have been the only ones with a card that used the API. Because they would have had a very small installed base, nobody would have written programs to take advantage of the API. Other companies have made raytracing accelerators. This isn't new. Most of them have not done incredibly well because there is so little actual use for the product.

Think of it this way... How many programs have you seen written for the 3DFX glide API? So, if you are one of the people who still has a glide card, but it was designed so that it couldn't do OpenGL becuase it used completely different technology, how useful would it be to you?

Personally, I'd love a card like that, if it was well supported by Lightwave, and had a vibrant developer community, and multiple vendors making cards for the raytracing API, and I was sure it wouldn't disappear soon.
Re:oh yeah by Spy+Hunter · 2005-03-14 18:04 · Score: 2, Interesting

Bah. Raytracing is not required for good graphics. Pixar's Photorealistic RenderMan didn't even have raytracing until version 11, which came out *after* Monsters, Inc.
Raytracers can easily do hard shadows, reflection, refraction, and order-independent transparency. Today's rasterizers can do almost all that too: hard shadows (stencil shadows), and "good enough" reflections and refractions (using environment maps and shaders). Order-independent transparency is a tough one; it can be kludged using shaders, but it is often better simply to work around it.
Realtime raytracing is a dead end, because all of the techniques that make offline raytraced images good (soft shadows, subsurface scattering, caustics, global illumination, etc) are too slow to implement in a real time raytracer. Rasterizing renderers require hacks to simulate many things that raytracing does more naturally, but those hacks run tens if not thousands of times faster than their more physically accurate raytraced equivalents. What those hacks lose in accuracy they gain back in speed, essentially producing more image quality per unit time. And in real-time graphics, time is the most important thing.

--
main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}

FINALLY by Anonymous Coward · 2005-03-14 16:36 · Score: 4, Interesting

Hopefully, this will help FPGAs to get some much-needed exposure. Their potential is obvious to me, as I think it must be to anyone who's been shown some of what they can do. (For example, this wiki article mentions that current FPGAs can achieve speedups of 100-1000 times over RISC CPUs in many applications.)

Every time I hear about the latest beast of a GPU from ATI or NVidia, I can't help thinking what a waste all those transistors are for anything other than gaming, and maybe a couple other applications. We should be putting those resources into an array of runtime-programmable FPGAs! Your computer could reconfigure itself at the hardware level for specific tasks -- one moment a dedicated game console, the next a scientific workstation, etc.

Lest I get too breathless here, does anyone care to inject some reality into this? Are there technological reasons why FPGAs haven't burst into the mainstream yet, or is it something else? Have I misunderstood their potential entirely?

Re:FINALLY by PxM · 2005-03-14 16:54 · Score: 1, Interesting

They're still Not Good Enough. FPGAs are faster than running software on a normal CPU, but they're still not as fast as running on pure hardware. While modern GPUs are programmable, they're still dependent on extreme hardware which is basically tons of simple circuits doing the same few operations. FPGAs are used when the system has to be more flexible than just 1) get vertex 2) transform 3) paint. Places like ATI do use FPGA systems when they are designing the hardware since it has faster turnaround time from design->test->debug than real hardware. However, these FPGA implementations of GPUs tend to be 1-2 orders of magnitude slower than the final hardware.

--
Free iPod? Try a free Mac Mini
Or a free Nintendo DS, GC, PS2, Xbox
Wired article as proof

high quality animation by poopdeville · 2005-03-14 16:36 · Score: 3, Interesting

This is great! I do work with an animation company, and a couple of these bad boys would seriously speed up our render times. The last video our lead artist did had to be rendered below 720x480 because we didn't have six months or a cluster of G5's. We've also been looking at buying time on IBM's supercomputers, but this might end up being cheaper in the long run.

--
After all, I am strangely colored.

Re:Can someone setup a torrent by synthparadox · 2005-03-14 16:38 · Score: 2, Interesting

http://owntracker.com/synth/index.php

Torrent of low quality up, others will come as they finish downloading.

Saarland... by Goonie · 2005-03-14 16:38 · Score: 3, Interesting

It's really interesting to see that this comes from the University of Saarland. Saarland is a rather out of the way part of Germany, near the border with France and Luxembourg.

It's rather pretty in a European countryside kind of way - hills with wine grapes on them, big rivers with boats cruising up and down, and big vegetable gardens everywhere (Germans sure love their vegetable patches) - though I doubt it's the kind of place too many international tourists visit. Not the kind of place you'd expect cutting-edge graphics research either; but then, you find all manner of interesting research in all manner of places. Even Melbourne, Australia :)

Hi to any residents of Saarland reading this - are they holding the German round of the World Rally Championship there this year?

--

Any sufficiently advanced technology is indistinguishable from a rigged demo
--Andy Finkel (J. Klass?)

How, Exactly? by Musc · 2005-03-14 16:48 · Score: 2, Interesting

Would you care to enlighten me as to what exactly ray tracing brings to the table, above and beyond what we already get from a state of the art GPU?

Only thing I can think of is that ray tracing would
allow us to replace complicated hacks for shadows
and reflections with a more natural implementation, but I can't imagine how this will usher in a new era of gaming.

--
Hamsters are at least as feathery as penguins. HamLix

Re:How, Exactly? by The+boojum · 2005-03-14 17:15 · Score: 2, Interesting

How about non-tesselated geometry? You can have high detail curved surfaces without turning everything into a dense polygon mesh. That in turn lowers the memory and rendering requirements so you can apply those resources to proper detail instead.

Or how about global illumination lighting effects? Truely emissive surfaces and area lights? As a hobbyist map maker, I would kill to have an engine that supported these; imagine being able to just tag the sky as an emmisive surface and have the entire level lit up and shadowed accordingly without having to painstakingly add bounce lights everywhere and tune them till they looked correct.

Another good argument that I've heard is based on the complexity of the algorithms. GPU style rendering is inherently order O(n) on the number of items in the scene. While ray-tracing has a high constant factor, a good ray tracing acceleration structure makes the problem O(log n) -- as the scene grows the time to ray trace gets closer to the time to GPU render it. Admittely you can do much the same with the GPU: there's a lot of stuff like BSP trees and bounding volume heirarchies and frustrum culling that you can do to speed it up, but then you're already applying ray-tracing techniques anyway, and now you're really talking about something that's more of a hybrid with the CPU just doing the ray-tracing part.
Re:How, Exactly? by Sycraft-fu · 2005-03-14 20:19 · Score: 2, Interesting

Actually, raytraced shadows look like crap, at least in the classical sense. Classical raytracing is done by tracing rays back from the pixels to light sources. Works fairly well, but you find the images don't look right. There's a number of things that are generally problematic but the shadow one is that shadows are hard.

You have to use a different kind of rendering to get soft shadows and proer reflection. Radiosity is a popular method. Basically you treat all objects as light sources, after a fashion. Light hits an object, you calculate the relfections from it, then from that and so on. Gives nice, soft, fairly realistic results... But of course is an ironclad bitch in terms of processing time. It's an iterative process, you have to do multiple reflection passes, and the more you do, the more realistic it looks.

Really, when you get down to it, ray tracing isn't really that exciting anymore. As you note, GPUs do a better job now with the tools they have. It is, perhaps, less correct in the mathematical sense and more of a "hack" but that doesn't matter. It's all about making things the most realistic/pretty the fastest for the least cost.

So I agree, neat technology, and probably not useless, but I'm not seeing it for videogames.

Do you know what an FPGA is? by fireboy1919 · 2005-03-14 16:58 · Score: 5, Interesting

You're to be describing this as if it's some kind of custom hardware with many limitations.

This could not be further from the truth. FPGAs are more flexible than any of their counterparts. FPGA stands for "field programmable gate array," and are basically a matrix of memory elements (at the very least latches) connected to gates that configured to be a particular type of gate via a ROM or something similar.

It's kind of like a chip emulator written in hardware. You may be wondering why we don't use these all the time. First, they're a lot more expensive, bigger, and more power consuming than their one-chip cousins. Second (as if that isn't really enough), they're usually 2-5 times slower than the same logic on a custom chip.

So the big question is why should we use them? What improvements can they give that normal chips can't?

The big gain is when you want to optimize the hardware for a specific application and be able to change it. These were used in high end digital video cards to be able to handle whatever kind of signal is actually output by whatever kind of camera you've got (I can only assume this is still the case, but I stopped keeping track about 2000).

I don't know if the people who wrote this thing take advantage of this idea within their design, but it's a possibility.

--
Mod me down and I will become more powerful than you can possibly imagine!

What kind of FPGA? by brandido · 2005-03-14 17:04 · Score: 2, Interesting

Working with FPGAs, I was quite curious to find out what kind of FPGA they are using - both Xilinx and Altera have some advanced hard functions (such as Multiply Accumulate functions, Block RAM, etc) that seem like they could have a huge impact on the abilities of this board. Unfortunately, after browsing through the links, I had no luck in finding any information about what FPGA they are using. Was anyone able to find this out? Even looking at the pictures of the board, it only shows the bottom side of the board, so it is impossible to see the chip markings!

--
First Falcon-1 to orbit, then Falcon-9. Then I can die a happy man.

raytracing with 350 million polygons? by Speare · 2005-03-14 17:06 · Score: 2, Interesting

Are you sure the Boeing thing was raytracing 350 million polygons? Or just traditional raster pipeline rendering?

See, the reason I ask is, you generally get away from raytracing polygons and raytrace against the actual nurbs or other mathematical surface definitions. That's the point. You don't feed it to simple scan-and-fill raster pipelines.

--
[ .sig file not found ]

Anti-Planet by KalvinB · 2005-03-14 17:18 · Score: 5, Interesting

Anti-Planet Screenshots. Anti-Planet is a FPS rendered entirely using ray tracing. It requires an SSE compatible processor (PIII and above. AMD only recently implemented SSE in their processors). This has been out long before Doom 3 and runs on systems Doom 3 couldn't possibly run on and the graphics tricks it does are just now being put into raster graphics based games.

That, along with Wolf 5k inspired me to start working with software rendering. I think ray tracing will eventually be the way real time graphics are rendered in order to keep upping the bar for realism.

Real Time Software Rendering

I'm working on tutorials covering software rendering topics. The tutorials start by deobfuscating and fully documenting Wolf5K, cover some basic ray tracing and are now going through raster graphics since the concepts used for raster graphics apply for ray tracing as well. I'll be returning to do more advanced ray tracing stuff later. The tutorials also cover an enhanced version of Wolf5K written in C++ that is true color and has no texture size limitations.

--
Work Safe Porn

Re:Sweet deal! by forkazoo · 2005-03-14 17:28 · Score: 4, Interesting

In general, yes, lights will be one limiting factor. I'm going to blabber a bit about how complexity grows in raytracing when you move past very simple scenes... Then get to your comment about Doom3.

In the simplest algorithm, assume only point lights, no spots or area lights. Basically, when you are shading a point, so you can draw it on screen, you trace a ray from that point to each light. (You may limit the lights that are at a distance beyond some cutoff, doesn't matter.) If the ray hits some geometry on the way to the light, it is in shadow for that light.

So, without reflections, or anything cool, just pointlights, and shadows, you will trace
S+L*S rays

where S is the number of scene samples (pixels) that you are shading, and L is the number of lights. The lone S comes from all the rays you trace from the eye point out into the scene in order to figure out which point is visible at which pixel.

If you have lots of reflections and refractions, that's what can really start to slow things down. At your point being shaded, you have to trace a ray each for the reflection, and for the refraction. If the reflection ray then hits another surface which is reflective, you trace another ray to get the reflected reflection, same with refraction. So, in theory, each sample point can spawn two new rays in addition to the rays for shadow tests, and each of those two new rays can result in two more new rays, etc. You basically have to set some limit to how many times you let it recurse, because two parallel mirror planes would take forever to render accurately.

But wait, there's more! (it slices, it dices!) Everything really starts to explode when you throw out soft shadows and hard reflections. If you want everything to be nice and soft and smooth, you basically have to trace lots of rays and average the results. So, instead of each recursion in a shiny refractive scene spawning two more rays, it may need to spawn 20 or 200. Assume a max recursion of 5, and 20 rays being generated by each shading point.
First point traces twenty rays.
Each of those 20 trace 20 for 400.
Each of those 400 trace 20 for 8000
160,000
3,200,000 shading sample points for the fifth level of recursion, each of which will need to trace rays for each of the lights which might not be casting a shadow on it, possibly many more for soft shadows.

So, 3.2 million times Lights times soft_shadow_samples times pixels times samples_per_pixel (and believe me, 10 samples each for the reflection and refraction is not very smooth in my experience!)

A veritable explosion of rays, as I am sure you see. I won't even begin to discuss radiosity, because that's actually slow, and computationally intensive. :)

Now, we get to the subject of Doom3... I'm not sure this hardware would actually be that well suited to Doom3. You know all the lovely shading effects, with detailed highlights and bump mapping? They pretty much define the Doom3 Experience. That all comes from a technology called programmable shading. Basically, while your GPU is rendering the polygons in the game, it runs a tiny little program that determines the precise shade of black for each and every pixel.

A raytracing accelerator takes advantage of the fact that ray hit-testing is a very repetative chore which can be done in hardware very efficiently.

But, as you can see, most of the really interesting rays in a scene are the "secondary rays." The rays that are for reflection and refraction and lighting and such. So, suppose this card calculates a ray, and figures out the point that needs to be shaded. Because the accelerator is all in hardware, for programmable shading like Doom3, it would need to hand-off back to the host processor, which would run the shader code, which would ask for 20 more rays, etc. So, with a fully fixed function raytracer, there would either be annoying limits on what the scene could look like, or you would constantly be going back and forth be

It takes more than a chip by sacrilicious · 2005-03-14 18:36 · Score: 4, Interesting

Within the last five years I worked for a company that made 3D rendering chips. The operation that was encoded in hardware was that of testing a ray against a triangle; on the chip produced by my former employer, this operation could be done in parallel something like 16 times, using only one or two clock cycles.

Once this functionality was achieved, there were some contextual architectural decisions to be made about what asic would include these gates. The company decided to implement these gates on a chip that had about 16MB of ram on it and its own execution unit (vaguely like one of the subchips in IBM's upcoming cell architecture, IIUC) and then to put arrays of these independent exec chips on daughter cards.

Many of these decisions were trying to solve the specific problems of raytracing, e.g. how do we get geometry info into the chips efficiently, how can we parallelize the running of shaders so they don't bottleneck things, etc. These problems manifested themselves quite differently than they did for zbuffering hardware, and there were lots of clever-yet-brittle constructs used which could be shown to work in specific cases but which had pot-holes that were hard to predict when scaling or changing the problem/scene at hand.

Rather than selling these chips themselves, the company decided that programming them was hard enough that the company itself would package up the chips into a "rendering appliance", which was essentially a computer running linux with a few of these daughtercards in them. For a software interface to rendering packages, the company chose Renderman. The task then became to translate rendering output from disparate sources (Maya, etc) into renderman expressions, and this was devilishly hard to get right. Each rendering package had to be individually tweaked in emulation, and some companies didn't help out much with info, and even those that did weren't able to supply all the info needed in many cases... my former employer ended up chasing un-spec'd features down ratholes.

The end result was really a disaster. Nothing worked quite right, which was problematic because these chips were marketed not just as fast but as faster drop-in replacements for existing software renderers.

I find it interesting how this entire tsunami of problems snowballed from the initial foundation of how raytracing algorithms (and therefore hardware) are different from zbuffering.

--
- First they ignore you, then they laugh at you, then ???, then profit.

Re:It takes more than a chip by gr8_phk · 2005-03-15 02:38 · Score: 2, Interesting

You're quite right about the differences between rasterizing and ray tracing. In my RT library, polygon (and other object) intersection tests are not the limiting factor, not even close. Using a proper scene subdivision structure you generally only do a couple intersections per pixel, and then only one shading operation. This means a 1 megapixel display running at 60 FPS need only do 60M shadings per second (barring reflections etc). The bottleneck in my code is a recursive tree traversal. Unfortunately once you optimize that, you still have a very strange mix of tree traversal, intersection test, shading, and recursive rays. It's not a problem that makes for easy hardware pipelining.
Shameless plug: rtChess
The library is actually much faster now than that old release of the game. If moores law continues by giving us multicore chips, you'll have realtime raytrace FPS, and our chess game will be slugish with photon mapping both in about 7-8 years.

OpenRT + CELL by Anonymous Coward · 2005-03-14 19:31 · Score: 2, Interesting

Could CELL be programmed to do OpenRT as efficiently as this chip?

Slashdot Mirror

3D Raytracing Chip Shown at CeBIT

27 of 391 comments (clear)