Distributed Compilation, a Programmer's Delight
cyberpead writes in with a Developerworks article on the open source tool options that can help speed up your build process by distributing the process across multiple machines in a local area network.
Sorry - compiling
More
Due to a strange quirk in the way compilers are designed, it's (MUCH) faster to build a dozen files that include every file in your project than to build thousands of files.
Once build times are down to 5 - 15 minutes you don't need distributed compiling. The link step is typically the most expensive anyway, so distributed compiling doesn't get you much.
Imagine a beowolf cluster for those.
Article summary: use 'make -j', 'distcc' and 'ccache' or something combination of these. These utilities are well known and widely used already, no?
"If you think the problem is bad now, just wait until we've solved it." --- Arthur Kasspe
There's a minor error in the article, which claims that your servers need access to the source. distcc was designed to not need this.
c++;
Slashdot readership plummets to an all-time low as programmers actually have to work.
Sky rockets in flight... distcc delight......
distcc deliiiiiiiight.
"When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
It requires all preprocessing (-E option to gcc) to be done on one machine. This can frequently create such a big bottle neck (especially on C++ code) that it swamps any gain from the distributed obj compiling. The preprocessing takes so long that the rest of the machines are idle and waiting for jobs and you are also sending huge text files across the network. I know distcc supports compressing these files, but even that is done serially on the initial compile machine.
distcc also is very flawed in a multiuser environment when it comes to fair distribution of jobs. You typically end up with one machine bogged down and the compiles actually end up taking longer than just doing everything locally.
I guess you are refering to the preprocessing step of C and C++ compilers, which was really a lame hack, I think. If you have a lot of include files, preprocessing produces large intermediate files, which contain a lot of overlapping code, that has to be compiled over and over again.
Preprocessing should have been removed a long time ago, but nasty backwards compatability issue, it was never done. Other languages, such as Java and D, solve this problem in a much better way. Just as did TurboPascal with its TPU files in the late 1980's.
The article says each machine involved in a distcc compile must have access to all source files. This is not true: distcc runs the pre-processing stage, where all the header files are included, before sending the processed file to the machine(s) for compilation. They do not need the source and not even the includes/libs the software links against. All that happens on the central machine.
He's using TCSH! That's BAD FOR YOU!
Ok, enough offtopic. This is actually pretty cool, considering our development environment is clusters and clusters of IBM P-Series LPARS, and our codebase is (A) disgustingly huge, and (B) actually pretty amenable to parallelized make.
FINALLY, I can justify to my boss that browsing /. is research! (Now if I could just make a good case for 4chan...)
Welcome to the Panopticon. Used to be a prison, now it's your home.
The reason for a lot of build machines in the rack may not be horsepower but rather you need x different machine versions, or a certain build only builds on a certain machine because of licence restrictions or you may only have one windows box with the Japanese character set installed because it causes so many problems that multiplying the problems just isn't worth it and so on and so forth. Building across n number of the same machine version just isn't worth the work IMO. Just get a bigger machine and save on the machine maintenance.
So the real benefit of distcc might be parallel compilation; I see a big future for this, particularly with the chipsets becoming commonplace. Once upon a time, I would not countenance a dual-chip machine in the rack because of the indeterminate mayhem it would sometimes cause to a random piece of code deep in the bowels. Those problems are well gone.
Umm. I wonder how this plays out how with VMWARE? A distributed compiler smart enough to use the (correct) local compiler across a varied build set would be worth having ...
Patriotism is a virtue of the vicious
...snooze...
Dang, no info on creating uniform toolchains for each distcc arch. IncrediBuild at work is really good about that, though it has the distinct advantage of being able to just shoot a single executable over the wire if the remote end needs it.
It's similar to distcc, but with some notable benefits.
"linux is just DOS with a UNIX like syntax" -- Galactic Dominator (944134)
If you want to distribute compilations, you must not use the 'native' gcc option, it will cause the compiler instances to emit objects in the native format of the compiler invoked and your compilation hosts may not all be identical.
I don't know the meaning of the word 'don't' - J
The only place this ever worked well, and transparently, was when using DSEE on Apollo.
But then, it could also reuse .o files already in place from some other developer's compilation. It was rigorous enough to know the dependencies leading to that .o were identical, including environment issues.
Tried to find it, but couldn't. It goes something like this:
Panel 1: PHB, walking by Dilbert's cube: Dilbert, why aren't you working?
Panel 2: Dilbert: My programs are compiling.
Panel 3: PHB, sitting back at his desk by himself, thought bubble: I wonder if my programs compile.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
It's called an "interpreter".
(Cue flamewar in 3...2...1...
Table-ized A.I.