Renderfarm Setup Tips?

← Back to Stories (view on slashdot.org)

Posted by Cliff on Tuesday June 15, 2004 @05:15AM from the engineering-massive-raster-output dept.

CarlosOlivaG4 asks: "We're in the process of acquiring and setting up a renderfarm, and I'm hoping the Slashdot community might light us up a little here. We'll use 6 to 8 nodes first, but would like to be able to expand it in the future." There was an earlier version of this question, but it dealt more with the hardware of the farm's nodes, rather than the network and software infrastructure on which these nodes would be based.

"In the hardware side, we still haven't made a choice between using AMD's Opteron or Apple's Xserve G5 (they have some very nice and price convenient cluster nodes which seem to be ideal for this kind of job), with Linux. As for the networking between them, is Gigaethernet enough or should we be going for Fiber? The software used to manage the render queues is another important point as well: I've been looking into Rush, and even though it's a commercial package, it works on all of the platforms we currently use (W2k/XP, Irix, OS X and Linux). But then there is also Dr. Queue, which is open source and is supported on at least the *NIX members of the aforementioned OS's. Other options include RenderPal and Pixar's RenderMan, but I would prefer an F/OSS alternative. Finally, it's worth noting that we'll be using the renderfarm for Maya and Adobe AfterEffects."

29 of 253 comments (clear)

Min score:

Reason:

Sort:

Cinelerra by selfabuse · 2004-06-15 05:16 · Score: 5, Informative

You might want to check out Cinelerra. It has pretty good support for renderfarms. I built one out of scrap 300mhz machines, and it only took a weekend.
1. Re:Cinelerra by selfabuse · 2004-06-15 05:20 · Score: 3, Informative
  
  And here is the link I forgot.
node deployment: g4u! by hubertf · 2004-06-15 05:22 · Score: 4, Informative

Check out g4u for deploying your render machines - it's a image based disk cloning tool that uses DHCP and FTP and which doesn't care what you run on your clients. (g4u itself is based on NetBSD, but that doesn't matter for the application).

I've used g4u to setup a ~50 node video rendering cluster, see my webpage on the Regensburg Marathon Cluster.

Enjoy!

- Hubert
Our experiences by Thagg · 2004-06-15 05:25 · Score: 5, Informative

We had a renderfarm for "The Chronicles of Riddick" of 40 boxes. Each box was a dual-proc Opteron.

We evaluated several render-queue management systems, and decided on Rush. The most persuasive arguments for using Rush were the very good experience we have heard from other users, and the simplicity of extending it to manage a variety of different tasks. I have to add Hammerhead to the list of happy customers. It did everything we could have hoped for. In particular, it was able to handle the inevitable crashing of machines pretty well.

While it's true that Rush is a proprietary, gotta-pay-for-it system; a robust render queue management system pays for itself very quickly in the ability to make your renferfarm productive. Perhaps a render queue manager is overkill when you have just 6 or 8 systems, but once you get up to 30 or 40 it is essential.

Our experience is all under Linux, but if you're going to be running After Effects that means that you're not going to be running Linux -- so there's not too much more I can help you with there. We did find that the dual Opterons worked much more efficiently than dual Xeons in multiprocessor rendering -- don't know about the Xserves, though. We were running mostly Maya, RenderMan, Shake, and our own in-house tools on the farm.

This farm is unfortunately powered down now that Riddick is done -- if you need some dual opterons, let me know at thad@hammerhead.com

--
I love Mondays. On a Monday, anything is possible.
1. Re:Our experiences by Anonymous Coward · 2004-06-15 07:06 · Score: 2, Informative
  
  The Maya renderer is not SSE optimized, so the extra fpu in the Opteron and Athlons gives them an edge. The Opterons are about equal to P4s in SSE enabled apps, the Athlons lag behind. Mac hardware is overpriced, especially if you are going to buy more in the future. And find some benchmarks - I don't think many apps use Altivec, but many do use SSE/SSE2.
  
  www.zoorender.com has a limited/single benchmark for Maya and Mental Ray on a variety of hardware. Pay attention to their comments about how the Maya renderer has better throughput on a dual-proc box when you run 2 processes vs 1 process with 2 cpus. Our queue did that by default.
  
  Rendering is all about CPU power and dual procs scale almost linearly. It's also nice to have half the servers to manage and also the flexibility of running 1 or 2 processes on a dual proc box with 2GB+ RAM. But, dual-proc hardware is more expensive - the cheapest option is always consumer whitebox hardware. Also, under windows, there is a ~1.6GB memory limit for a single process. But XP service pack 2 may expand this (for certain hardware? I don't know details.)
  
  Eventually you will have I/O bottlenecks, and this will become your main headache. 6-8 procs should work for normal samba/nfs NAS. I don't know AE, but it may render much faster than 3D stuff - in which case you will probably use attached disk.
  
  I can also vouch for RUSH's internals, I signed a NDA and got to see the code/architecture. It is amazingly scalable. We were considering buying it to replace our homegrown queue that managed 350 procs. Instead, we went out of business.
  
  But, with any queue you buy, there will always be scripting 'glue' and troubleshooting that needs to be done.(usually: can't find the files or I/O bottlenecks on the files, and tuning rendering/raytrace parameters) Bottom line: If I were you, I'd find some AE benchmarks on different hardware, then I'd get dual-proc Opterons (for flexibility and memory bandwidth) with 2GB RAM to start, and run linux - but I think AE will force you to run XP(?)
2. Re:Our experiences by Thagg · 2004-06-15 07:06 · Score: 4, Informative
  
  Good questions
  
  1.) The words 'dual' and 'Opteron' both surprised us. We were kind of under the impression that maybe single proc machines would be better for a render farm. We were really curious why dual was chosen over single. Did the extra cost end up being worth it?
  
  The computers are relatively cheap -- it's render licenses (especially for RenderMan) that are expensive. With the newest version of RenderMan, Pixar has deigned to let us use the two processors of a dual-processor machine with a single license. This lowers the cost of rendering by about 60%, if the machine rendered twice as fast with dual processors. In fact, for RenderMan, the Opterons were indeed almost twice as fast, where the Xeons were only about 50% faster.
  
  Our other big rendering application was Shake, and it also allowed the use of two processors with one license.
  
  2.) You mentioned that Opteron was more efficient than Xeons. I just had to ask: Was the particular software you were using particularly tuned to Opteron (i.e. 64-bit?) or was the 32-bit side of it just pleasant to work with? Any more insight you can share with me about the use of Opteron would be most helpful.
  
  Yes and no. The Opterons are the first AMD machines to implement the SSE2 instructions, which are heavily used by RenderMan. Also, the HyperChannel communication between processors on the Opteron is light-years beyond the communication between Athlons and Xeons. On the other hand, there is absolutely no advantage in the 64-bitness of the Opterons -- we were running a 32-bit Linux (RedHat 9), and we weren't using more than 4 GB memory on any of the boxes.
  
  3.) Did you guys end up buying a bunch of machines from a place like IBM or something, or was it more like "we bought the components and assembled ourselves..?" If it's the former, how'd you like the service?
  
  We hired a beige-box manufacturer. We specced it out to various places, and PC Mall built them for the best price. If I had to do it over again, I'm not sure that I wouldn't go with IBM -- while they cost a lot more, I expect that they'd build more solid systems.
  
  4.) Any regrets or things you'd do differently next time around?
  
  We bought minitower machines instead of the more trendy, space- and power-efficient 1U or blade machines. We did that so that we could potentially use the new Gelato renderer from NVidia -- that software uses the current NVidia high-performance graphics cards as an external array processor, giving significantly better render performance.
  
  As we didn't end up using Gelato, that was perhaps a mistake. We ended up power and HVAC constrained in the end -- as happens with almost every renderfarm I've heard of.
  
  5.) Why are you getting rid of the machines used for Riddick? Or did I read that wrong?
  
  No, you read it right. Hammerhead is a small company, typically working on just one show at a time. We don't see a use for the machines for another nine months or so, as we begin development of the next project -- and it just isn't right to leave all that compute horespower idle.
  
  Thad Beier
  Hammerhead Productions
  
  --
  I love Mondays. On a Monday, anything is possible.
3. Re:Our experiences by forkazoo · 2004-06-15 07:18 · Score: 2, Informative
  
  From previous posts, I seem to recall that you use Lightwave, right? Regarding 2), head over to blanos.com, and check out his benchmarks. I just looked at 2p P4 Xeons vs 2p Opterons. Looks like Opterons wipe the floor from the specific scenes I was looking at, but I didn't see any P4 Xeon 2p results for the radiosity test scene. I'd like to see that, as LW is supposed to use SSE pretty heavily in radiosity, so the clock speed of the P4 would be a potential bonus.
  
  The big difference between Pentia and Opterons is that Pentia use a shared front side bus, while the Opties use Hypertransport, whith a memory controller on each CPU, so the CPU's aren't fighting each other to get at data to chew on as much.
4. Re:Our experiences by NanoGator · 2004-06-15 07:37 · Score: 2, Informative
  
  "From previous posts, I seem to recall that you use Lightwave, right?"
  
  Yes, that is correct.
  
  " 2), head over to blanos.com, and check out his benchmarks."
  
  Good idea!
  
  "but I didn't see any P4 Xeon 2p results for the radiosity test scene"
  
  Heh I just did that this morning. I ran the skull radiosity test with 8 threads. It was on a Dual P4 Xeon 2.4ghz 533 bus and Hyperthreading enabled. 119s. (I'll try to remember to log that at Blanos, time permitting...) That was running LW8, not sure if that make a difference.
  
  "The big difference between Pentia and Opterons is that Pentia use a shared front side bus, while the Opties use Hypertransport, whith a memory controller on each CPU, so the CPU's aren't fighting each other to get at data to chew on as much."
  
  Ugh. I hate trying to balance all this stuff because it's not so clear what exactly LW needs. For example, it takes so long to render, that it's not all that clear that moving 300 megs of textures through the memory bus is a huge bottleneck. (Good thing we got blanos!)
  
  Thanks for the reply
  
  --
  "Derp de derp."
XGRID by artlu · 2004-06-15 05:28 · Score: 3, Informative

Recently, I have been working a lot with Apple's xgrid. We have been linking about 4 G5s/G4s together and getting impressive results. I don't understand your hardware situation, but if you are a Mac guy try it out.

GroupShares.com

--
-------
artlu.net
1. Re:XGRID by Johnny+Mnemonic · 2004-06-15 06:03 · Score: 2, Informative
  
  If you're considering Apple G5s, either in workstations or in Xserves, take a look at Apple's mailing list for help and resources. Folks there have been working about clustering and Xgrid on the Mac for a while now.
  
  --
  
  --
  $tar -xvf .sig.tar
Re:Why run Linux? by Anonymous Coward · 2004-06-15 05:28 · Score: 1, Informative

Why would you run Linux on the Apple boxes? Wouldn't OSX be just as good?

Freedom.
Do it properly by smharr4 · 2004-06-15 05:29 · Score: 2, Informative

If you're going to do this, do it properly. Get systems with massive amounts of I/O that will cope with all the data you're trying to throw at them. For this kind of work, you need only buy from one vendor: SGI.

Don't bother with Intel/Linux, with dodgy hardware and the frequently-changing Linux code. Pay the money, get decent hardware with a support contract and a steady, stable, tried and trusted OS.

Apple *may* be an appropriate choice, now that Pixar have ported RenderMan to OS X, but I don't like the idea of my arrays running at 7200 rpm's. Get SGI, get fibre channel, and (possibly) get gigabit ethernet.

It'll all pay off - it won't be cheap, but in the long run, the results will be worth the money and the wait.
1. Re:Do it properly by Donny+Smith · 2004-06-15 05:50 · Score: 2, Informative
  
  >Get systems with massive amounts of I/O that will cope with all the data you're trying to throw at them.
  
  Render nodes get input via simple render scripts, output frames get written to the file server one by one every X seconds as they get rendred. Textures are shared but it's never "massive" and never "thrown at them" (compute nodes).
  The I/O loading is concentrated on the file server.
  
  > Don't bother with Intel/Linux, with dodgy hardware and the frequently-changing Linux code.
  
  So HP, IBM and other Intel/Linux servers that rendered all those movies are "dodgy hardware"?
  
  >Get SGI, get fibre channel, and (possibly) get gigabit ethernet.
  
  I don't think 8 nodes can write and read data quickly enough to saturate a gigabit link to NFS server since while frames are being rendered NFS' I/O is very low.
  With that number of nodes perhaps they could muddle thru without external storage (maybe even internal SCSI would do). GbE is more important (and cheaper) than FC storage so I'd say GbE is a must.
  
  n1 n2 n3 n4 n5 n6 n7 n8
  
  [gigabit lan]
  
  nfs1
  (optionally with direct-attach FC disk array)
pvm is the way by Goeland86 · 2004-06-15 05:31 · Score: 3, Informative

I think that what you're looking for is a renderfarm for computer graphics rendering, right? in that case, you should be looking at PVM or OpenMOSIX or even MPI. In either case, since you're going to have more data crunching than actual data transfer, I think that even T100 would be enough. gigabit will be nice, but fiber is not worth it. Drqueue is nice... if you can get it to work, and I didn't. We used pvmpovray for many things, and I think that might be worth a look. pvmpovray exists for gentoo with an ebuild script, which would make the installation and configuration the minimum pain for it. But that option requires a conversion from maya to povray files. Also, I don't know what the pricing is going to be like, but if it were up to me, I'd take the Opterons, because I believe they are faster, although I'm not positive on that, and because I know they're well supported under linux, and again, I think that's a more personal choice to make, but the impression I got from AMD is that you always get the most for your buck. The Opterons also let you find replacement parts or upgrades a little easier than the G5 if you burn a CPU or motherboard. That's just my $.02 worth of advice.

--
---- I am certain of only one thing : I know nothing else.
Re:Why run Linux? by Anonymous Coward · 2004-06-15 05:32 · Score: 4, Informative

Why would you run Linux on the Apple boxes?
Because of its unnecessary flashiness? OS X is notoriously bloated. For the command line junkies among us, Linux fits the bill.
dynebolic by Anonymous Coward · 2004-06-15 05:42 · Score: 2, Informative

There is a distro called dynebolic which is specifically for multimedia. It features Veejay, Cinelera, MJPEG tools etc. It is supposed to run off the CD. It is supposed to use every computer hooked to the network that is running dynebolic in the manner of a render farm. That is a very attractive idea. Go around the house, put in dynebolic CDs and go nuts.

I can't say that it is great because I haven't been able to do much of the above. Maybe you will have better luck.
Re:I had a related question by Anonymous Coward · 2004-06-15 05:52 · Score: 3, Informative

Take a look at Veritas Storage Foundation Cluster File System. You can have your 50+ Terabytes on a SAN fabric, and each server on the fabric can have the 50+ terabyte LUN mapped to them. The Cluster File System manages all of the concurrent access to the filesystem from each node so things don't get clobbered.

You'll get your performance through the SAN by utilizing high performance FCAL disks and multiple HBAs to your servers. You can have the load balanced across the HBAs to give you the bandwidth that you require.
Re:Software? by Anonymous Coward · 2004-06-15 05:55 · Score: 1, Informative

A gerat point hidden in the partent's post.
Call the vendor and/or FOSS-project-team and get reference customers you can talk to.
Depends on your workstations by BBStriker · 2004-06-15 05:56 · Score: 5, Informative

Hey

I've set up and administrated a number of farms over the years (doing it as I type. its.. what I do). One thing you really want to do, certainly with Maya's renderer, is to try to use the same OS and platform on your farm as you use on your user workstations. There can be subtle or even obvious differences in the render output between OS's, and since you'll have enough issues to deal with you'll want to keep cross-platform incompatabilities out of the mix. Please, trust me on this. Had to deal with Maya Irix/Win2k/Linux differences in the past.

As for queueing software, give Condor a look-see. Free and functional. I reverse-engineered a Perl version of it before they made their source available, and my version has been run quite successfully at several animation studios and an effects house over the years. It's a well architected system for distributed computing.

Feel free to contact me if you've got any other render system or management questions. I'm always interested in seeing how other studios approach the challenge.
Re:I had a related question by dolphin-brother · 2004-06-15 06:10 · Score: 2, Informative

That's a difficult question to answer without knowing something of your setup. How are the spindles organized--SAN, individual file servers NFS cross-mounting, or what? Which OSes are you running? Also, how much money are you able to spend to resolve this problem?
If you could rebuild everything from the ground up (and had tons of money to throw at it), you'd most likely want to build a system based on a very expensive vendor solution.
Assuming that you can't do that, your best bet would be to go with some sort of parallel filesystem, the likes of Lustre, GFS, Ibrix, GPFS or CxFS. The architectures of these vary, but the basic principle they share is performance scalability based on increasing the number of data paths to the disk. So if you have, say, 100 nodes on a high-speed network, you take 10 of them and attach them to your SAN. The parallel filesystem spans the entire SAN and so requests from the nodes can reach any bit on the SAN from any of the ten paths. If you need more performance, you add more paths: controllers, HBAs and storage nodes. I know GPFS scales linearly in performance based on the number of paths to the data, and I believe the others scale well also.
I haven't hit 50 TB on disk (I have on tape, but your post suggests that tape wouldn't give you the performance you need), but I have set up several 4-8 TB GPFS filesystems that could easily grow to 50 TB if I had the spindles.
Good luck finding a solution; symlink-based solutions on a convnentional *NIX filesystem are a nightmare; I sympathize.
Re:Why run Linux? by GFLPraxis · 2004-06-15 06:17 · Score: 2, Informative

Actually, I find Mac OS X to run faster on my 1 GHz G4 than Mandrake Linux 10.0 on my 2.6 ghz Pentium 4. That flashiness doesn't slow it down that much, you see...the Mac makes use of the 3d graphics card (which usually does almost nothing when you're not playing a 3d game) to control all the window effects. As a result, the processor is hardly taxed.
For true 64 bitness, launch every Linux! by LightStruk · 2004-06-15 06:20 · Score: 2, Informative

Apple has not yet released a true 64-bit version of OS X, while Gentoo released a PPC64 version a few weeks ago. If you're going to buy 64 bits of CPU, you might as well get 64 bits of OS too.
nVidia Gelato by Guspaz · 2004-06-15 06:56 · Score: 2, Informative

You might want to look into nVidia Gelato. It's a 3D renderer that uses Quadro FX cards as secondary FPUs, supposedly doubling or more the speed of rendering. They claim it's two to six times faster than the leading renderer. There's a demo, so you can verify those claims for your uses.

It runs under Linux, and "will function with whatever [render farm] management system you currently use.".

To reiterate, it's a SOFTWARE renderer, that is hardware accelerated by using the video card as a co-processor.
Re:I'm a Machead, but... by Chanc_Gorkon · 2004-06-15 07:05 · Score: 2, Informative

Um, why? Others have used Macs and this is a ideal thing for a Xserve and it obvously was not a problem for the person in this question. Both platforms he has presented are both new and are both goign to run him a lot of money. Plus Apple has a Cluster Node config that will work well for him (only one hard disk....in this case you'd use some other storage or a Xserve Raid. In fact, I bet Pixar may use G5's soon if they don't have them soon. Xserves are fast and if you make yuor cluster out of them, you will get your renders much quicker.

--
Gorkman
Re:gigabit ethernet likely overkill by NovySan · 2004-06-15 07:20 · Score: 2, Informative

3D rendering is proc intensive. 2D composite work (Shake, Digital Fusion, Nuke, etc) is NETWORK I/O intensive. You must have gigE at least from the main background plate server and shared element server. The pipe that drops off the final product is usually sending (HD proxies for internal review) and receiving (final output frames) so it's best to have that gigE too.
Re:Deadline Render Queue (beta) by bhouston · 2004-06-15 08:33 · Score: 2, Informative

Cool stuff. We will be posting a new beta on the site later today -- with a couple enhancements. I just loaded up Deadline Monitor here and it shows that we have 98 machines current rendering, 3 idle, 41 offline (user workstations) and 5 machines disabled. Overnight during really heavy production loads (i.e. back in February with Scooby Doo 2) we would have all possibly machines rendering except for the 5 or so disabled ones (which are file servers or underpowered machines.) Thus from a pragmatic standpoint it scales at least to 150 machines without any trouble. How big is the blur render farm?
Advice gleaned from years of bitter experience by gorodish · 2004-06-15 09:44 · Score: 3, Informative

Having built two generations of renderfarms, and now working on a third, I'd suggest building it as cheap as possible. You will want to upgrade every 2 years or so, so make sure that you won't feel bad disposing of the old farm when it's time for the new.
Regarding networking: you have to look carefully at the way the farm will be used. If you are doing any kind of compositing (which requires high I/O rates), you'll benefit from gigabit ethernet. You'll also benefit from gigabit if you have exceptionally short render times (less than 30 minutes per frame), since in this case I/O is a significant fraction of each frame's render cycle. But the longer your per-frame render time, the less necessary gigabit is. We've always used 100base and it still serves us well. Fiber is expensive and provides nothing you'll need that copper can't provide.
The individual machines should have identical configurations and be interchangable. Your goal is to not care when an individual machine dies. In light of this, there should be no local storage of data. You can save money on support if you buy spares instead of service contracts. Warranties also work, but the big manufacturers give their worst service to warranty-only customers.
Don't wire anything but ethernet to the machines. KVM wiring is expensive and unnecessary. Each machine should run unattended until it dies; when it does, you can wheel over a monitor and keyboard to diagnose it.
Opterons are fast, compatible, cool, lower-power and cheap. Xserves are nice, but we've found that Darwin doesn't integrate well into a pure Unix environment. You'd also be locking yourself into a single manufacturer.
Linux is cheap and effective, and easier to configure correctly as a server OS than as a desktop OS. There is so much commercial software available for it now that there is little reason to consider Windows or a commercial Unix. We haven't found Linux support from the big manufacturers to be all that great; if you use Linux, assume that you will have to solve most problems on your own.
Post Production House by acidream · 2004-06-15 09:50 · Score: 2, Informative

I work at a small post production facility in Hollywood. We have a render farm of 6 dual Xeon Win2000 Boxx rack machines as well as five dual Xeon Win2000 Boxx workstations that render in their spare time. We run Maya and After Effects and we use Smedge to handle the distributed rendering for both maya and AE and Mental Ray. We also have another Dual Xeon box running Server 2003 with a eight drive raid setup with two main partitions, one is raid 1 for the maya scenes and the AE projects and the other is raid 0 for the rendered frames and comps. We started with just two rack machines, but we add one or two every couple months when the budget permits. Our renders are sent to Smedge via a script that we run from an Interix csh. It greatly simplifies the process of sending out renders. Each project we work on has a script with the name of the project, or the shot, and when we're ready to render we run the script which parses the text file with all the parameters for the render such as frame range, render quality and so on. Some people don't care for Smedge to much, but it gets the job done, and works for most all 3d applications as well as most compositing apps.
Re:I'm a Machead, but... by mcdesign · 2004-06-15 10:17 · Score: 3, Informative

But if you are using Shake the OS X version is $2999.00 or $2000 dollars cheaper than the Linux version. The OS X version also comes with unlimited render only nodes for free. Each Linux render node costs $1499.
So for say 10 computers:
Cluster node version of Xserver 10 @ $2,999.00 = $29,990*
Shake 1 @ $2,999 = $2,999
Total = $ 32 989

Now the Linux version will cost:
Shake 1 @ $4,999 = $4,999
Render nodes 9 @ = $13 491
Total costs software = $18 490
This leaves you with $14 499 to buy 10 x86 boxes or $1449.90 each. Those G5's don't seem so expensive after all.
* Yes I know it will need more RAM but so will the Linux boxes.