Optimizing distcc

← Back to Stories (view on slashdot.org)

Posted by michael on Tuesday March 30, 2004 @09:26AM from the barn-raising dept.

IceFox writes "Having fallen in love with distcc and its ability to speed up compiling (insert anyone who compiles like Gentoo users or Linux developers). I recently got the chance to dive deeper into distcc. By itself distcc will decrease your build times, but did you know that if you tweak a few things you can get a whole lot better compile times? Through a lot of trial and error, tips from others, profiling, testing and just playing around with distcc, I have put together a nice big article. It shows how developers can get a bigger bang for their buck out of their old computers and distcc with just a few changes."

29 of 201 comments (clear)

Min score:

Reason:

Sort:

Reliefe for the /. site by Anonymous Coward · 2004-03-30 09:32 · Score: 2, Informative

distcc optimizations - March 30th 2004
and how to compile kdelibs from scratch in six minutes

If you don't already know about distcc I recommend that you check it out. Distcc is a tool that sits between make and gcc sending compile jobs to other computers when free, thus distributing compiles and dramatically decreasing build times. Best of all it is very easy to set up.

This, of course, leads to the fantastic idea that anyone can create their own little cluster or farm (as it is often referred to) out of their extra old computers that they have sitting about.

Before getting started: In conjunction with distcc there is another tool called ccache, which is a caching pre-processor to C/C++ compilers, that I wont be discussing here. For all of the tests it was turned off to properly determine distcc's performance, but developers should also know about this tool and using it in conjunction for the best results and shortest compile times. There is a link to the homepage at the end of this article.
Farm Groundwork and Setup

As is the normal circle of life for computers in a corporate environment, I was recently lucky enough to go through a whole stack of computers before they were recycled. From the initial lot of forty or so computers I ended up with twelve desktop computers that ranged from 500MHz to 866MHz. The main limit for my choosing dealt with the fact that I only had room in my cube for fifteen computers. With that in mind I chose the computers with the best CPU's. Much of the ram was evened out so that almost all of the final twelve have 256MB. Fast computers with bad components had the bad parts swapped out for good components from the slower machines. Each computer was setup to boot from the CD-ROM and not output errors when booting if there wasn't a keyboard/mouse/monitor. They were also set to turn on when connected to power.

Having enough network administration experience to know better, I labeled all of the computers, the power cord and network cord that was attached to them. I even found different colored cable for the different areas of my cube. The first label specified the CPU speed and ram size so later when I was given faster computers, finding the slowest machine would be easy. The second label on each machine was the name of the machine, which was one of the many female characters from Shakespears plays. On the server side a dhcp server was set up to match each computer with their name and IP for easy diagnosis of problems down the line.

For the operating system I used distccKNOPPIX. distccKNOPPIX is a very small Linux distribution that is 40MB in size and resides on a CD. It does little more then boot, gets the machine on line and then starts off the distcc demon. Because it didn't use the hard disk at all, preparation of the computers required little more than testing to make sure that they all booted off the CD and could get an IP.

Initially, all twelve computers (plus the build master) were plugged into a hub and switch that I had borrowed from a friend. The build master is a 2.7Ghz Linux box with two network cards. The first network card pointed to the Internet and the second card pointed to the build network. This was done to reduce the network latency as much as possible by removing other network traffic. More on this later though.

A note on power and noise, the computers all have on-board components. Any unnecessary pci cards that were found in the machines were removed. Because nothing is installed on the hard disks they were set to spin down shortly after the machines are turned on. (I debated just unplugging the hard disk, but wanted to leave the option for installation open for later.) After booting up and after the first compile when gcc is read off the CD the CD-ROM also spins down. With no extra components, no spinning CD-ROM or hard disk drives the noise and heat level in my cube really didn't change any that I could notice (there were of course jokes galore by everyone about saunas and jet planes when I was setting up
1. Re:Reliefe for the /. site by IceFox · 2004-03-30 10:11 · Score: 2, Informative
  
  Hmm didn't finish reading the article did you (that was in the parent poster!)? If you had you would see that in fact the noise level didn't rise in my cube. :D -Benjamin Meyer
  
  --
  Do you changes clothes while making the "chee-chee-cha-cha-choh" transformation sound?
Re:strlen by Anonymous Coward · 2004-03-30 09:35 · Score: 1, Informative

"Hear, hear."
Distccd for cygwin by aberant · 2004-03-30 09:38 · Score: 5, Informative

My life changed the day i found out i could get my super fast P4 Windows XP box to compile for my slow linux box. Distcc for cygwin is a miracle. check out the thread at Gentoo forums
Martin Pool interview by Wise+Dragon · 2004-03-30 09:39 · Score: 5, Informative

Martin Pool, the brains behind distcc, was interviewed by ZDNet yesterday. How timely.

http://web.zdnet.com.au/builder/program/work/sto ry /0,2000034960,20283318-1,00.htm
1. Re:Martin Pool interview by Wise+Dragon · 2004-03-30 09:58 · Score: 2, Informative
  
  Oh and look, it made Slashdot (in the Developers section).
Mirror by Rufus211 · 2004-03-30 09:40 · Score: 4, Informative

I feel like burning my new site in a bit =)

http://hackish.org/~rufus/distcc.php.html
Mirror by after · 2004-03-30 09:42 · Score: 2, Informative

The article is loading really, really slooooow, I was able to get a html-only copy of it.
Re:Martin Pool interview - clickable link by Poisonous+Drool · 2004-03-30 09:43 · Score: 4, Informative

Developer Spotlight: Martin Pool
Re:ccache by aridhol · 2004-03-30 09:44 · Score: 2, Informative

Yeah, he says that ccache would speed up the compilation, but he specifically disabled it so it wouldn't interfere with his timings (later runs would appear more efficient than they should be).

--
I can't say that I don't give a fuck. I've just run out of fuck to give.
Improving builds. by Anonymous Coward · 2004-03-30 09:45 · Score: 2, Informative

(1) Use Scons
(2) Use --jobs=2 (or however many processors you have).

Build times will be greatly improved - and it's cross platform as well.

In my opinion - especially if you have a complicated project - distcc isn't worth it. The machine takes so long pre-processing everything (including header files) - that you loose whatever advantages you might have with offloading the actual compilation work. It's especially useless with MSVC once you start using precompiled headers.
Or... You could do it properly. by Moderation+abuser · 2004-03-30 09:49 · Score: 4, Informative

Install Sun Grid Engine[1] since it's free and now open source and then not only do you get qmake for distributed builds but you also get a general purpose distributed processing system. And hey! It even has the current buzzword "grid" in the title so your PHB will love you.

[1] http://gridengine.sunsource.net/

--
Government of the people, by corporate executives, for corporate profits.
1. Re:Or... You could do it properly. by Anonymous Coward · 2004-03-30 11:15 · Score: 1, Informative
  
  I have never tried qmake on grid engine, but it is horibly slow in schedualling task to run on our cluster even with limited amounts of jobs in the queue. I wonder what the performance is like compared to distcc. If you are performing nightly builds that take 10 hours to complete then it is no problem. But if you are doing interactive building where you are expecting under 5 min builds the the schedualing could be a problem.
Why wasn't a factorial experiment used? by alptraum · 2004-03-30 09:50 · Score: 4, Informative

Sigh, another experiment that could have benefitted greatly from factorial experimentation. If your unfamiliar with DOE, here is a basic introduction courtesy of NIST:

http://www.itl.nist.gov/div898/handbook/pri/sect io n1/pri11.htm

It appears in this case we have a variety of factors and trying to, in this case, have a response of "elapsed time" for compilation and it is a minimization problem. Instead of looking at factors individually, a factorial DOE would have allowed interactions to be analyzed and to look for a global optima rather than just optimizing individual factors and then tossing them all together, it doesn't work that way a lot/most of the time.

If the author of this article is present: Why wasn't a factorial experiment used?
Re:/.-ed already? by Thud457 · 2004-03-30 09:50 · Score: 2, Informative

A: 9/11/2001.

--
the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff
Electric Cloud by Anonymous Coward · 2004-03-30 09:50 · Score: 3, Informative

Yes, distcc is nice, but anyone with a really big build (say like hours long) must take a look at John Ousterhout's company Electric Cloud (yeah, John Ousterhout as in Tcl) here. They've built this replacement for gmake that runs the jobs in parallel but is smarter than distcc because it can break open all the recursive makes and run _everything_ in parallel and it works cross platform too. It's $$$ and not OSS :-) but designed to be ultrareliable.
Re:Gentoo Impact(s) by y2dt · 2004-03-30 09:52 · Score: 5, Informative

official gentoo distcc guide:
http://www.gentoo.org/doc/en/distcc.xml
Re:Article Text (Slashdotted Server) by Anonymous Coward · 2004-03-30 09:53 · Score: 3, Informative

You must be new so I will explain it to you. There is a class of users that regularly Karma Whore and get their Karma maxed out and then proceed to burn Karma by posting things like goatse.cx at +1. They do this not only to annoy people but to prove the flaws with the moderation system. While this guy may not be one of those people it is important to not reward somebody for posting an article. Users can easily post the article anonymously and avoid this issue altogether.
Re:behind the XCode curtain by Anonymous Coward · 2004-03-30 10:01 · Score: 5, Informative

Yup, look at the X code preferences for distributed builds. The cool part is they use Rendezvous to automatically find machines to send work. You can set your box to use these others and/or offer service to others. Also on dual processor boxes is will treat them as two machines and do two compiles at once.

Anyway, you can see distcc running when you have X code enabled for distributed builds and running.

--jim
Re:Wow... by supabeast! · 2004-03-30 10:03 · Score: 2, Informative

Actually, that's sort of off-topic because distcc negates the need to set up a beowulf cluster.
Missed the best point by MerlynEmrys67 · 2004-03-30 10:06 · Score: 4, Informative

He completely ignored the usage of distcc and ccache together. The pair of applications make for a huge win.
There are some problems though - which do you do first ccache or distcc (answer on my benchmarks is ccache - if it isn't in the cache send it on the network) how fast is your "build" machine - this is critical. The build machine is resonsible for preprocessing the file, checking if it is in the cache and then sending it out to be turned into an object. Especially when you interact the results of ccache (which most of your builds are just the same file over and over - very few "changed" files) and distcc - most of your time is spent in the first pass compiler.
In our environment we had boatloads of dual XEON machines around - they made wonderful build machines, and it didn't hurt that we connected them with Gig Ethernet either. Did wonders for our build times.
Over all distcc and ccache are wonderful tools that should be in every large compile environment - making compiles that used to take days take simple minutes. But you want to make sure that the dependancy between ccache and distcc work optimally in your environment.

--
I have mod points and I am not afraid to use them
1. Re:Missed the best point by IceFox · 2004-03-30 10:33 · Score: 2, Informative
  
  He completely ignored the usage of distcc and ccache together. The pair of applications make for a huge win.
  Actually I mentioned it in the first paragraph...
  
  --
  Do you changes clothes while making the "chee-chee-cha-cha-choh" transformation sound?
Recursive Make Considered Harmful by JWhitlock · 2004-03-30 11:45 · Score: 5, Informative

There was an interesting paper by Peter Miller in 1997 called "Recursive Make Considered Harmful". It makes a good case for why recursive make is a bad idea, slowing down compile times and clouding dependancies. Benjamin Meyer has proved the point again, with his use of unsermake - if you generate a non-recursive make, then distributed compiles are twice as fast.
Unfortunately, the makefile creator most people use, automake, creates only recursive makefiles. Maybe a replacement like unsermake will get automake developers thinking about radical changes. I wouldn't mind seeing M4 go away, for one.
Re:Can distcc model be used for other apps? by lisany · 2004-03-30 11:55 · Score: 2, Informative

What really happens is that you can use the so-called "masquarading" method installation, which basically means you set up symlinks called gcc, g++ and whatever to the distcc binary. Prefix your PATH with this directory and calling `gcc` will work.

In my opinion this is easier (and better) than doing `make CC=distcc gcc`
distcc isn't so great by Lord+Ender · 2004-03-30 13:13 · Score: 2, Informative

My roommate and I both use Gentoo. We also both have AthlonXPs. When we first turned on distcc, cutting our compile times in half, we were overjoyed. But then random compiles started failing. Not until I turned of distcc could I get some packages to compile. The point is, distcc isn't flawless.

--
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
1. Re:distcc isn't so great by Anonymous Coward · 2004-03-30 13:39 · Score: 2, Informative
  
  At work, we have a bunch of gentoo boxes and discovered the same thing as you. Emerges would randomly fail. We were hoping we could figure out why, but no luck. We had to pull distcc off the boxes.
  If I had more time I would trace through things and try to figure out why they failed. But I don't have that much time.
  
  I still like the idea behind distcc and hope that someday (soon) they'll get it working correctly.
2. Re:distcc isn't so great by KFK2 · 2004-03-30 14:04 · Score: 3, Informative
  
  Well.. here goes a couple of mod points that I spent.. but I'd thought I'd chime in..
  
  My friend recently had the same thing happen, and the conclusion we came to was that the compiler versions were different on the distcc servers (3.2.2) versus the client (3.2.3).. and the preprocessed code being sent off had syntax erorrs or something of the like when it was sent off (something to do with one of the new options in the latest gcc). I don't recall exactly what option it was or what package(s) were failing... but I do know that somewhere there was an 'if gcc-version 3.2.3 then add some options to CFLAGS' (maybe /etc/make.globals? or make.conf).
  
  This is one of the biggest things I have found with distcc.. compiler versions have to be pretty similar.. usually even the incremental version changes affect the compiles..
  
  I've not had any problems with using distcc, both with compiling Gentoo packages, along with my own projects..
  
  Kenny
When to use distcc and ccache by xixax · 2004-03-30 13:48 · Score: 2, Informative

I went to a talk about these two tools, and getting the most out of them depends (to an extent) on knowing the nature of your compile. For example, if you are working only only a small part of a project comprised of many objects, you will probably benefit from ccache more than from distcc (in that only those objects affected by your code changes are rebuilt).

On the same tack, the performance of distcc will (to an extent) depend on the nature of the compilation task used in the test (I am not familiar with kdelibs).

--
"Everything is adjustable, provided you have the right tools"
Plug for Xcode... by boola-boola · 2004-03-30 14:02 · Score: 2, Informative

While we're on the business of discussing distcc, I've gotta say... Xcode supports it quite nicely (including the pretty GUI distcc Monitor), and _ALL_ it takes is checking two boxes in the preference panel. I'm serious.