Optimizing distcc
IceFox writes "Having fallen in love with distcc and its ability to speed up compiling (insert anyone who compiles like Gentoo users or Linux developers). I recently got the chance to dive deeper into distcc. By itself distcc will decrease your build times, but did you know that if you tweak a few things you can get a whole lot better compile times? Through a lot of trial and error, tips from others, profiling, testing and just playing around with distcc, I have put together a nice big article. It shows how developers can get a bigger bang for their buck out of their old computers and distcc with just a few changes."
Yep, root of all evil. strlen. Fix strings and you'll fix everything.
For some reason, "Imagine a beowulf clusters using this" is on-topic.
This is so weird.
I must drink now.
"I do NOT suffer from a mental condition. I'm enjoying every second of it."
...maybe you should work on disthttpd next?
By the time I read the article, my kdelibs was compiled.
Looks like that server won't be doing much compiling soon...
From the article:
I even found different colored cable for the different areas of my cube.
I wonder if he also sealed the empty packaging, waste paper, and dead hardware in neat little foil packets before disposing of them in the proper receptacle, which, of course, sits right next to the cozy for his server. ;)
I also reply below your current threshold.
distcc optimizations - March 30th 2004
and how to compile kdelibs from scratch in six minutes
If you don't already know about distcc I recommend that you check it out. Distcc is a tool that sits between make and gcc sending compile jobs to other computers when free, thus distributing compiles and dramatically decreasing build times. Best of all it is very easy to set up.
This, of course, leads to the fantastic idea that anyone can create their own little cluster or farm (as it is often referred to) out of their extra old computers that they have sitting about.
Before getting started: In conjunction with distcc there is another tool called ccache, which is a caching pre-processor to C/C++ compilers, that I wont be discussing here. For all of the tests it was turned off to properly determine distcc's performance, but developers should also know about this tool and using it in conjunction for the best results and shortest compile times. There is a link to the homepage at the end of this article.
Farm Groundwork and Setup
As is the normal circle of life for computers in a corporate environment, I was recently lucky enough to go through a whole stack of computers before they were recycled. From the initial lot of forty or so computers I ended up with twelve desktop computers that ranged from 500MHz to 866MHz. The main limit for my choosing dealt with the fact that I only had room in my cube for fifteen computers. With that in mind I chose the computers with the best CPU's. Much of the ram was evened out so that almost all of the final twelve have 256MB. Fast computers with bad components had the bad parts swapped out for good components from the slower machines. Each computer was setup to boot from the CD-ROM and not output errors when booting if there wasn't a keyboard/mouse/monitor. They were also set to turn on when connected to power.
Having enough network administration experience to know better, I labeled all of the computers, the power cord and network cord that was attached to them. I even found different colored cable for the different areas of my cube. The first label specified the CPU speed and ram size so later when I was given faster computers, finding the slowest machine would be easy. The second label on each machine was the name of the machine, which was one of the many female characters from Shakespears plays. On the server side a dhcp server was set up to match each computer with their name and IP for easy diagnosis of problems down the line.
For the operating system I used distccKNOPPIX. distccKNOPPIX is a very small Linux distribution that is 40MB in size and resides on a CD. It does little more then boot, gets the machine on line and then starts off the distcc demon. Because it didn't use the hard disk at all, preparation of the computers required little more than testing to make sure that they all booted off the CD and could get an IP.
Initially, all twelve computers (plus the build master) were plugged into a hub and switch that I had borrowed from a friend. The build master is a 2.7Ghz Linux box with two network cards. The first network card pointed to the Internet and the second card pointed to the build network. This was done to reduce the network latency as much as possible by removing other network traffic. More on this later though.
A note on power and noise, the computers all have on-board components. Any unnecessary pci cards that were found in the machines were removed. Because nothing is installed on the hard disks they were set to spin down shortly after the machines are turned on. (I debated just unplugging the hard disk, but wanted to leave the option for installation open for later.) After booting up and after the first compile when gcc is read off the CD the CD-ROM also spins down. With no extra components, no spinning CD-ROM or hard disk drives the noise and heat level in my cube really didn't change any that I could notice (there were of course jokes galore by everyone about saunas and jet planes when I was setting up
After being posted on /.
"Dieing Ben-ja-min" - Short Circuit 2
Mod +5 Drunk
ccache is also nice for optimizing compiling. He probably mentioned it in the article, but since it seems /.-ed I wouldn't know... and by the time you've got both distcc and ccache running the article might be available again so you can read if you did it the right way :-)
distcc optimizations - March 30th 2004
and how to compile kdelibs from scratch in six minutes
If you don't already know about distcc I recommend that you check it out. Distcc is a tool that sits between make and gcc sending compile jobs to other computers when free, thus distributing compiles and dramatically decreasing build times. Best of all it is very easy to set up.
This, of course, leads to the fantastic idea that anyone can create their own little cluster or farm (as it is often referred to) out of their extra old computers that they have sitting about.
Before getting started: In conjunction with distcc there is another tool called ccache, which is a caching pre-processor to C/C++ compilers, that I wont be discussing here. For all of the tests it was turned off to properly determine distcc's performance, but developers should also know about this tool and using it in conjunction for the best results and shortest compile times. There is a link to the homepage at the end of this article.
Farm Groundwork and Setup
As is the normal circle of life for computers in a corporate environment, I was recently lucky enough to go through a whole stack of computers before they were recycled. From the initial lot of forty or so computers I ended up with twelve desktop computers that ranged from 500MHz to 866MHz. The main limit for my choosing dealt with the fact that I only had room in my cube for fifteen computers. With that in mind I chose the computers with the best CPU's. Much of the ram was evened out so that almost all of the final twelve have 256MB. Fast computers with bad components had the bad parts swapped out for good components from the slower machines. Each computer was setup to boot from the CD-ROM and not output errors when booting if there wasn't a keyboard/mouse/monitor. They were also set to turn on when connected to power.
Having enough network administration experience to know better, I labeled all of the computers, the power cord and network cord that was attached to them. I even found different colored cable for the different areas of my cube. The first label specified the CPU speed and ram size so later when I was given faster computers, finding the slowest machine would be easy. The second label on each machine was the name of the machine, which was one of the many female characters from Shakespears plays. On the server side a dhcp server was set up to match each computer with their name and IP for easy diagnosis of problems down the line.
For the operating system I used distccKNOPPIX. distccKNOPPIX is a very small Linux distribution that is 40MB in size and resides on a CD. It does little more then boot, gets the machine on line and then starts off the distcc demon. Because it didn't use the hard disk at all, preparation of the computers required little more than testing to make sure that they all booted off the CD and could get an IP.
Initially, all twelve computers (plus the build master) were plugged into a hub and switch that I had borrowed from a friend. The build master is a 2.7Ghz Linux box with two network cards. The first network card pointed to the Internet and the second card pointed to the build network. This was done to reduce the network latency as much as possible by removing other network traffic. More on this later though.
A note on power and noise, the computers all have on-board components. Any unnecessary pci cards that were found in the machines were removed. Because nothing is installed on the hard disks they were set to spin down shortly after the machines are turned on. (I debated just unplugging the hard disk, but wanted to leave the option for installation open for later.) After booting up and after the first compile when gcc is read off the CD the CD-ROM also spins down. With no extra components, no spinning CD-ROM or hard disk drives the noise and heat level in my cube really didn't change any that I c
Do you changes clothes while making the "chee-chee-cha-cha-choh" transformation sound?
My life changed the day i found out i could get my super fast P4 Windows XP box to compile for my slow linux box. Distcc for cygwin is a miracle. check out the thread at Gentoo forums
I wonder if all the time he'll save in his compiling will add up to the amount of time he spent figuring out how to speed it up + the time spent writing this article?
Martin Pool, the brains behind distcc, was interviewed by ZDNet yesterday. How timely.
o ry /0,2000034960,20283318-1,00.htm
http://web.zdnet.com.au/builder/program/work/st
I feel like burning my new site in a bit =)
http://hackish.org/~rufus/distcc.php.html
This was a great read... which I was fortunate enough to do before this poor guy's machine got /.ed. Anyway, an adaption of this article aimed at specific users or tasks (developers, Gentoo users, etc) would be awesome! Kudos for the writeup. Can't wait to go home and try it out!
Have a Happy.
This is cool...I learned something on slashdot today. On a hunch I got a bash shell on my OSX box at home and typed "dist--", and lo there be distcc already installed and ready to go. That must be what they use for distributed builds in XCode
The "cue the foo posts in 3, 2, 1..." posts will commence with no subsequent foo posts in 3, 2, 1...
The article is loading really, really slooooow, I was able to get a html-only copy of it.
Developer Spotlight: Martin Pool
current running on 3 boxes at home, make compiles fun, since the other machines get to help out. here at work it's another story, beefy Linux boxes in the server room help me out when I need to do emerges for my workstation or server. it's all good fun.
CB
free ipod and free gmail!
(1) Use Scons
(2) Use --jobs=2 (or however many processors you have).
Build times will be greatly improved - and it's cross platform as well.
In my opinion - especially if you have a complicated project - distcc isn't worth it. The machine takes so long pre-processing everything (including header files) - that you loose whatever advantages you might have with offloading the actual compilation work. It's especially useless with MSVC once you start using precompiled headers.
Are there actually regular participants of Slashdot whose karma _isn't_ listed as excellent? If that's the case, how can "karma whoring" come into it at all? It's not like you get promoted to "excellenter" or "excellentest" or something.
Install Sun Grid Engine[1] since it's free and now open source and then not only do you get qmake for distributed builds but you also get a general purpose distributed processing system. And hey! It even has the current buzzword "grid" in the title so your PHB will love you.
[1] http://gridengine.sunsource.net/
Government of the people, by corporate executives, for corporate profits.
Sigh, another experiment that could have benefitted greatly from factorial experimentation. If your unfamiliar with DOE, here is a basic introduction courtesy of NIST:
t io n1/pri11.htm
http://www.itl.nist.gov/div898/handbook/pri/sec
It appears in this case we have a variety of factors and trying to, in this case, have a response of "elapsed time" for compilation and it is a minimization problem. Instead of looking at factors individually, a factorial DOE would have allowed interactions to be analyzed and to look for a global optima rather than just optimizing individual factors and then tossing them all together, it doesn't work that way a lot/most of the time.
If the author of this article is present: Why wasn't a factorial experiment used?
Yes, distcc is nice, but anyone with a really big build (say like hours long) must take a look at John Ousterhout's company Electric Cloud (yeah, John Ousterhout as in Tcl) here. They've built this replacement for gmake that runs the jobs in parallel but is smarter than distcc because it can break open all the recursive makes and run _everything_ in parallel and it works cross platform too. It's $$$ and not OSS :-) but designed to be ultrareliable.
You must be new so I will explain it to you. There is a class of users that regularly Karma Whore and get their Karma maxed out and then proceed to burn Karma by posting things like goatse.cx at +1. They do this not only to annoy people but to prove the flaws with the moderation system. While this guy may not be one of those people it is important to not reward somebody for posting an article. Users can easily post the article anonymously and avoid this issue altogether.
If you knew you were going to be slashdotted, wouldn't you link to a static version of the article instead of one running a PHP script?
AccountKiller
He couldn't. It's simply a risk you take when posting the article. The moderation system is intended to improve things for the reader, not to judge his (undoubtedly good) intentions. You have a point though, maybe Redundant moderations shouldn't decrease karma, just like Funny doesn't increase it.
btw, posting the article as non-AC is viewed by many as karma whoring, so it's not recommended anyway.
Please, people, please, will you keep redudantly repeating what the definition of redundant means over and over again several times. It still hasn't sunk in yet. It still hasn't sunk in yet. The redunant part that is. By which I mean the definition of the meaning of what the word redudant means.
Because I was logged in, and didn't even think about hitting the 'Post Anonymously' button. I don't care about Karma, as alien a notion that may be. Hence why I don't care if this gets modded Redundant, Informative, or Troll. I was simply trying to enable people to have an on-topic conversion when the original source of information was unavailable.
Apparently I should know better since I have a low userid, but to be honest, I don't post much, and I didn't know better, so I apologize.
There are some problems though - which do you do first ccache or distcc (answer on my benchmarks is ccache - if it isn't in the cache send it on the network) how fast is your "build" machine - this is critical. The build machine is resonsible for preprocessing the file, checking if it is in the cache and then sending it out to be turned into an object. Especially when you interact the results of ccache (which most of your builds are just the same file over and over - very few "changed" files) and distcc - most of your time is spent in the first pass compiler.
In our environment we had boatloads of dual XEON machines around - they made wonderful build machines, and it didn't hurt that we connected them with Gig Ethernet either. Did wonders for our build times.
Over all distcc and ccache are wonderful tools that should be in every large compile environment - making compiles that used to take days take simple minutes. But you want to make sure that the dependancy between ccache and distcc work optimally in your environment.
I have mod points and I am not afraid to use them
Lord of Ironhand, thanks for the polite bit of info about this possibly being viewed as karma whoring.
I appreciate it much more than those simply accusing me of karma whoring.
Hell yeah!
Maybe the mini-ITX cluster would come in handy for an additional *umph* with your large compiles? If they support PXE, you wouldn't even need the cd's.
We do not live in the 21st century. We live in the 20 second century.
Dude, check out your server, it's been hacked...
Mod +5 Drunk
So, then those are the types who get modded into oblivion. Self-correcting problem. To mod down a legitimate post (text of an unreadable site) because some of the people who do that might then post the goatse link, well, isn't all that realistic. (the fact that the goatse site is down notwithstanding).
Perhaps you're attributing motivations to this behavior (making a useful post) that doesn't apply.
Compiling time isn't an issue for you because your programs aren't large enough. You have no need for this tool. There will always be projects that take a non-trivial amount of compiling, no matter what language or technology you're using.
Is distcc integrated into the compiler components, or is it another layer below gcc, which divides up tasks?
If it's generalized, it would be cool to see it used for other CPU intensive tasks.. Video processing comes to mind. I would love to have a cluster bring down the times needed to:
- Convert MiniDV home video to MPEG2 DVD's. There are professional tools to do this.. A hobbyist tool that could do clustering would be excellent.
- Convert HDTV captures to MPEG2 for DVD archival. 1080i video processing involves some heavy number crunching.. downconverting a program for DVD archival takes hours of processing. Throw a few fast CPU's at it, and it could be done in real time.. This would make a nice back-end app for an HDTV PVR. You could take a 9GB HD program, and bring it down to 2GB.. making your PVR storage space last a lot longer.
Whether he knew it is was a re-post is irrelevant. The fact is that it was a redundant post. Marking as being such means that, ideally, duplicate copies of the same info don't show up on my Slashdot thread. Having the text of the original article mirrored is useful, but having it mirrored multiple times in the same story thread just adds more useless clutter to be sifted through.
-GameMaster
Rules of Conduct:
#1 - The DM is always right.
#2 - If the DM is wrong, see rule #1
how would you know that the goatse site is... nevermind
Read about it on slashdot, oddly enough.
Unfortunately, the makefile creator most people use, automake, creates only recursive makefiles. Maybe a replacement like unsermake will get automake developers thinking about radical changes. I wouldn't mind seeing M4 go away, for one.
I use distcc and sometimes it doesn't seem to help because I try to offload my compiles to my two slower computers first because I would rather keep my laptop cpu cooler. The problem is that sometimes it will actually take longer to compile. After reading about unsermake I really want to use it because, I think automake is my bottleneck. The question is how do you do it? Where can I find unsermake and how do I configure distcc to use it? The article is great on explaining what to change but not how to change it.
Time makes more converts than reason
Sure distcc might be good for a few machines, but it doesn't scale well. Trolltech's Teambuilder is much better suited for large scale distributed development environment. Ask Cisco. They evaluated both distcc and Teambuilder on huge multi processor solaris systems. Guess who they chose, as it scaled better. That's right! Trolltech's Teambuilder! Plus, Teambuilder is much easier to setup, and has very nice monitor to monitor your compile farm. Teambuilder
-- "Perceptions create reality. By changing your perceptions you change your reality."
Two things...
First, 10Mbit is plenty of bandwidth unless you're on a wimpy hub (people still use them). Get with the times and get a switch, it'll likely be 100Mbit anyways. Turning LZO on for 10Mbit may help, but the majority of the compile cycle (preprocess-send-compile-receive-link) will be spent doing compiling work (preprocess-compile-link), not sending and receiving. See the next paragraph for more...
Second, I must wonder if there is a diminishing returns effect with the addition of machines (especially the lower end models). I question this because increasing the number of jobs will add load to the master server with the preprocessing and linking. Not that with a beefy server this isn't a problem - but seriously, who has machines like that? :)
My roommate and I both use Gentoo. We also both have AthlonXPs. When we first turned on distcc, cutting our compile times in half, we were overjoyed. But then random compiles started failing. Not until I turned of distcc could I get some packages to compile. The point is, distcc isn't flawless.
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
It would be cool to use a distcc client which took my local code diffs, distributed them around the Internet, patched the distributed "standard" version, cc'd the code, and sent back binaries to my client. Crypto hashes against the revised code could ensure that I was really getting binaries from my actual uploaded diffs. But then everyone with "difstcc" would be recompiling so much that we'd each return to our original CPU bandwidth ratios :).
--
make install -not war
I went to a talk about these two tools, and getting the most out of them depends (to an extent) on knowing the nature of your compile. For example, if you are working only only a small part of a project comprised of many objects, you will probably benefit from ccache more than from distcc (in that only those objects affected by your code changes are rebuilt).
On the same tack, the performance of distcc will (to an extent) depend on the nature of the compilation task used in the test (I am not familiar with kdelibs).
"Everything is adjustable, provided you have the right tools"
While we're on the business of discussing distcc, I've gotta say... Xcode supports it quite nicely (including the pretty GUI distcc Monitor), and _ALL_ it takes is checking two boxes in the preference panel. I'm serious.
gcc: Error, invalid substitution.
gcc: Error, Syntax Error.
make: make failed, compiler returned -1 (Error)
-- Nate
It's not like you get promoted to "excellenter" or "excellentest" or something.
:-)
You mean you haven't been promoted yet? Ha! n00b...
Higher Logics: where programming meets science.
I'm curious to know if there is a distributed compile or build system out there for Java. I've looked around a little, and have only found a few abandoned open source projects. I'm surprised that there isn't some distributed Ant-based tool yet. Anyone out there know of anything?
And there's a damn good reason for it, too, but that's neither here nor there. Anyhow, this was fixed so you can do non-recursive stuff if you want to now.
Unfortunately, the very latest automake versions are trying to be way, way too clever, thereby breaking stuff in lots of projects. Time to throw it out and use something else.
Automake is a Perl script. It doesn't use M4. (Using M4 would make it more readable, ha.) You're thinking of autoconf.
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
It's strange the article mentions unsermake but not icecream, which IMHO beats distcc, despite not being officially called stable. Icecream has a scheduler and is therefore noticeably less stupid with distributing the builds (it happened to me repeatedly with distcc that it sent the first job to the most loaded node, which kind of sucks if you just changed one file).
Icecream can be obtained the same way like unsermake, it's located in kdenonbeta/icecream.
I understand the usage of distcc, and it seems quite helpful. But what about ccache ? the available info does not say much, except that it "caches" the output in the following way: if the object files are already present, they are not compiled again.
But I thought that the 'make' program does exactly that: if a source code file is newer than the object file, then the source file is compiled; if not, the current object file is used.
What is exactly that ccache does that make does not ?
That paper makes a spectacularly bad case. It provides no serious analysis to back up its wild claims, and mixes variables quite horribly. Approximately one page of the paper is spent talking about recursive makefiles, while the bulk of it is just about various ways in which people write bad makefiles, which apply equally to recursive and monolithic ones. It's like saying "We administered the drug to the patient, and then he was hit by a cruise missile. The patient died, so we have to assume the drug is not safe for human use".
This is the sort of paper that would be rejected out of hand by any serious journal.
Fortunately, most people just use automake rather than paying attention to this nonsense.
build smaller things
the record for compiling a plan9 kernel is 15s
I built & installed the kernel and the whole distributed userland in 45 mins on a Duron 800Mhz.
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
Yes and no.
First off the generalised methods you allude to are MPI, the older PVM, and there's Mosix too.
MPI and PVM are framework libraries that allow for code to be written to take parallelism into account. They tend to be used for numerics calculations (which was thier birthplace), simply because numerics are CPU bound. There are others, that are even more numerics centric (HPF - a Fortran varient, for example), but MPI should probably be the target of choice for new code, including non-numerics based calculations. Note that the term 'Beowulf cluster' implies MPI. The biggest point to note with MPI is that it allows for communication between nodes, and thus can be used for calulcations that are not trivially parralisable.
Mosix is subtle different. It's a patch to Linux that distributed processes across multiple boxes. This tends to work better for jobs with long runtimes, as opposed to many smaller duration processes, however. For example, Ralphzilla>
Parrallel applications need to be written to target MPI or PVM, whereas Mosix doesn't need special targeting. On the other hand, any multiprocessor aware application will be more efficent at using them than any automated solution. Still, Mosix may well be sufficent for most purposes.
The downside to MPI &c is that they require libraries installed. Which raises dependancies, so most such applications tend to use thier own libs, built in. This is actually not a bad idea - in that it allows more specific tailoring to take place. On the other hand, MPI &c libs can be tailored for a specific set up (e.g. using non-Ethernet conectivitiy - such as a mix of Ethernet, Myranet and Papers, for example). That's a little out of the intended usage for distcc, however, where minimal set up times are desired.
On the video front, then, transcode has a buildin cluster mode, for pretty much what you were talking about. Again, it's methods are all internal, but that's not an issue here.
That is why he mentions unsermake as a automake replacement to parralel build files. This makes distcc scaleable over much more machines.
That comment makes a spectacularly bad case. It provides no analysis to back up its wild claims. Approximately zero lines of the comment has to do with the paper, which it essentially mischaracterizes.
That paper makes a spectacularly bad case
It makes a fine case. The worst part is that it exaggerates the value of its own minor insight. The grandiose title harkens to the famous "Goto Considered Harmful", which in its time was a more insightful position.
Nobody should be surprised that globally correct choices cannot be decided with only locally correct data (for a non-greedy process, of course).
Moreover, the actual problems caused by suboptimal makefiles pales in comparison to what havoc goto can wreak. Anything wrong with makefiles can be solved by Moore's law (wait for the hardware to get faster, so you can do full rebuilds quickly). But spagetti code makes it more difficult for programmers to work with software, and there has been no observed exponential growth curve of human intelligence.
people write bad makefiles
That's a cop-out. The Makefile system has turned out to be too flexible for most needs. Because the build system relies on authors of individual make, the behavior of different Makefiles can be completely different (they're arbitrary programs, after all). That problem is analogous to the non-existent "package manager" on Microsoft Windows. Each Windows installer is an arbitrary program that might do anything, and whose actions cannot be reasoned about by software tools.
Furthermore, having one makefile in every directory is an almost assurewd way to produce bad makefiles.
which apply equally to recursive and monolithic ones.
Wrong. There is an inescapable difference in the performance (both speed and correctness). Recursive simply cannot compare with monolithic.
Note that "monolithic" doesn't necessarily mean the makefile is stored in only one file on disk. A collection of files assembled via include directives is equivalent to monolithic, but somewhat easier for revision control. Non-"make" build control processes, such as Ant or those provided with some IDEs, also share the advantages of monolithic makefiles.
The software industry has already demonstrated its support for RMCH, because all new "yet-another-better-than-make" projects take its ideas as unavoidable preconditions.
I wonder how much noise and heat is generated by 15 PCs running in a small cubeacular office environment....
The Russians have won. They have made the world a cesspool of distrust, greed, fear and hate.