A Distributed Front-end for GCC
format writes "distcc is a distributed front-end for GCC, meaning you can compile that big project across n number of machines and get it done almost n times as fast.
The machines don't have to be identical or be running the exact same GCC version, but having the same OS is helpful." With the advent of faster hardware, I can't complain about kernel compile times anymore, but larger source trees could definitely benefit from this.
That doesn't make too much sense. What if I had 50% 2.9 machines and 25% 3.2 machines, and a bunch mixed in-between? How would it know which version I wanted my program compiled with?
Mozilla
The machines don't have to be identical or be running the exact same GCC version
Well, to some extent they probably do. If you're running GCC 3.2 on one, you wouldn't be able to run 3.0 on another because of binary incompatibility.
Yay! My 133 doesn't have to take 25 billion years to compile anymore! Uhm, wait, I don't have any other computers... Shoot.
The sun compiler suite comes with dmake, which does the same on the level of make, rather than cc, but is essentially the same.
Definitelly would make beowulf clusters interesting for compilation as well as hard core numerics (no joke intendend).
You can almost never achevie a speed up of N. You can acheive S(N) = T(1)/(T(1)*alpha+((1-alpha)*T(1))/N+T0) Where T(1) is the time it takes to run the task with 1 computer, alpha is the part of the task that cannot be parallelised (as in startup registers etc.) and T0 is the communications overhead of the task.
:)
Just to clarify.
Sure, kernel compiles are fairly speedy but large projects still take forever to compile. Even on my AthlonXP 1900+. I downloaded the latest CVS snapshots of KDE in an attempt to de-bluecurve my new RH 8.0 machine and it took a couple hours to compile everything. Depressing.
Don't compile everything!
maybe mozilla would finally take less than an hour.. talk about bloat. This browser takes almost as long as the whole of kde...
--------------------------------- Born Again Bourne Again Believer: New Life, GNU/Linux Be Free!
This could really spur the development of OpenOffice.
With 50, 100 machines or so hooked up, OpenOffice's compile time could be reduced to as little as 1 or 2 days!!!
How about GigE.
Sorcerer, Lunar Linux, Gentoo and LFS! Yay!
So now we need to make a Distributed coder software...
Pulsed Media Seedboxes
You can almost never achevie a speed up of N. You can acheive S(N) = (1)/ (T(1)*alpha+((1-alpha)*T(1)) / N+T0) Where
T(1) is the time it takes to run the task with 1 computer, alpha is the part of the task that cannot be parallelised (as in startup registers etc.) and T0 is the communications overhead of the task.
This is the text book. Amdahls law, IIRC.
In reality, and also in most text books, there are exceptions where the solution scales with the number of processes.
And it should be easy enough to see: 5 machines compiling one source file each are 5 times as fast as one machine compiling 5 source files.
As long as you start gcc 5 times in a row you have
the same initialization overhead for EACH instance of gcc one after the other.
If you manage to start gcc with a couple of source files as argument to compile you save the laoding time of the binary at least. That would correspondend roughly to the alpha value.
Amdahls law is usefull for a single program/problem: try to paralelize gcc and you find the compiling source can't get speed up very much. So 5 processors running several threads of one gcc instance, those do not scale by 5.
However it says nothing about just solving the same problem multiple times in parallel.
Regards,
angel'o'sphere
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
I sincerely hope Apple makes this feature into projectbuilder, which compiles insanely slow when compared to codewarrior. If it wasn't for the superior interface and integration with interface builder, I'd swap back to codewarrior right away.
Does anyone here know how good the speed increase is when compiling on dual G4s versus a single proc ?
When will I end this grieving ? When will my future begin ?
Taco, is that you?
Is this better than say, Group Compiler?
Could someone please point out the difference between a parallel and/or distributed make, like pmake?
It sounds not realy reasonable to put the coding work into gcc when you like to have yacc/bison and a bunch of perl scripts and what ever else you have in your makefile also speeded up.
Regards,
angel'o'sphere
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
I thought that the binary incompatibility was only a C++ thing. So for some projects, that's an issue, but not for all. Of course, the idea of being careless about which compiler version you're using for building a large project is rather strange.
An ICBM enema for him and his mom? I'm sure that GW would approve.
I think the biggest plus is that you can have one hella fast machine on your network running distcc that basically does all your compiling for all your other machines. I can see this being a big bonus for server farms like rackspace.com. The customers would be getting compile speeds from a big ass server, rather than just their little dinky Duron.
~LoudMusic
No sig for you. YOU GET NO SIG!
How about the same thing, but for Java? My massive Java project takes forever to compile, plus I'm impatient and incompetent.
Or maybe ASM.
By the way, have you tried Linux?
Whether you're looking to install Gentoo on a old pentium to use as a router or sacrificing your first born to compile KDE, it should make things go quite a bit faster.
Well unless every computer you own runs Gentoo you want to emerge world.
The C ABI between *all* GCC versions (and probably other compilers too) are compatible. You can compile libgnome using GCC 2.95.2 and Nautilus using GCC 3.2 and not have any problems at all.
Could you imagine a...
nevermind.
My religion forbids the use of sigs.
Gee Dubya loves shooting things at Arab asses!
You'd think he was gay or something.
well, he is from texas....
I can imagine a day when, no one posts any of these lame ass jokes.... What next? Knock knock jokes?
Better move to Canada then. That way he can't violate you too.
Distributed Front End may be a bit of a misnomer. It appears this is a distributed preprocessor (article says it farms out preprocessed source to different copies of gcc.) Gcc presumably puts the preprocessed source through its own front end (parser), back end (optimizer), and produces binaries which are then linked on the main machine.
What's the problem? Optimizations are probably limited to each separately compiled module. Most optimizations will perform better across a larger code base. (Read the Dragon Book chapter 10 before telling me I'm wrong.) This method may produce valid code optimized by module but the code is nowhere as good as it could be. Making debuggable code is another challenge.
(If the ARE doing multiprocess optimization then I'm duly impressed. I doubt it, though.)
So now we need to make a Distributed coder software...
It's called Sourceforge.
If your system is well designed, compiling the entire thing should be a rare event. In a well designed project, most changes occur in c files or in headers only included in a few c files, so most changes only require compiling a very few files.
Compiling the whole source tree should be the sort of thing you do fairly rarely (for a big project), perhaps once a night, perhaps automated so no one has to watch it.
If compile time is something that is a significant problem for you, you really need to look at your code design.
The cake is a pie
One of the most useful things about Clearcase
is its ability to "wink in" object files
from other developers' views. That means, if
one developer in the team has built a version
with a certain time stamp, that particular
object file never has to be built by anyone
else in the group unless some dependency changes. It would kick ass if CVS had that
capability.
Magnus.
From the ccache homepage, which is also a Samba hosted project
ccache is a compiler cache.
It acts as a caching pre-processor to C/C++ compilers, using the -E compiler switch and a hash to detect when a compilation can be satisfied from cache.
This often results in a 5 to 10 times speedup in common compilations.
Ahem. Amdahl's law still operates, and you even say so yourself. There's a constant part that cannot be removed. Let's say it takes 50 msec to initialize gcc and 500 msec to compile the average source file. Then it takes 5.05 sec to compile ten files with one copy of gcc. Ignoring commiunications, it takes 0.550 seconds to compile them on ten machines. Is 5.05/0.550 == 10? No, it's about 9.2. Therefore, the speedup is LESS THAN N. Note that the faster the actual compile time, the lower the speedup would be!
Cantankerous old coot since 1957.
slightly off topic, but i've found that ccache to be amazing at speeding up compiles when developing code.
it basically hashes (after a cpp pass) and caches. alot of times one has to make clean, tweak a Makefile.am, change a preprocessor variable, or work with multiple different branches, such that most source files are still the exactly the same. in that case, huge speedups.
-- p
Wow multiple computers compiling one thing, imagine a beowulf clus.... errr nevermind
Running different versions could cause really nasty problems if different versions of gcc support different levels of C (like C99 or older C) or if one version has a compiler bug that another doesn't.
Can you imagine code compiling or failing to compile randomly depending on which machine happens to compile it? Yikes! Debugging nightmare...
The cake is a pie
Oh well, I've been trying to get a distributed kernel compilation system working using Platform's LSF. Guess I can throw that project away. ;)
-- Minds are like parachutes... they work best when open.
(why am I perpetuating this?)
i've recently written some dist compile tool using a different approach after i've been using distcc for half an hour ...
the big problem with distcc is that it does all the preprocessing on one machine, which is really an overkill in some situations (and limits the total speed increase one can gain).
what i've done is first running a modified version of make and then distribute all the objects which have to be compiled to the machines.
everything is done on these machines (including preprocessing) which only limits the number of compile machines by the speed of the network (i've been compiling on 60 hosts, with almost linear speed increase).
the only problem involved with this approach is that the same compiler has to be installed on these machines, and that they have to write on some sort of network shared filesystem (for the objects)
the same compiler is an easy thing since i've been using the intel compiler version 6 and common system includes (i've put them into a shared include dir)
any thoughts about this thing ? (or some folks willing to help me create a version basing on gpl stuff so i could release this one ?)
Distcc has a problem with gdb.
It appears when a file is pre-processed on the host machine gcc does not record the directory to the pre-processed output file. When the pre-processed file is farmed out to a remote gcc build machine the remote gcc compiler (not knowing any better) compiles the file and records the remote machine's directory to the object file. Now when a user tries to debug the program gdb cannot find the source directories.
Unfortunately, this "debugging" bug has to be fixed within GCC itself. This thread describes how GCC might be patched to allow gdb to work correctly with distcc, but at this time no action was taken.
This is not a huge problem - distcc is still great for production builds.
looks like this one is not necessarily a good idea to run on a university workstation cluster...
1.4 Security Considerations
distcc should only be used on networks where all machines and all users are trusted.
The distcc daemon, distccd, allows other machines on the network to run arbitrary commands on the volunteer machine. Anyone that can make a connection to the volunteer machine can run essentially any command as the user running distccd.
distcc is suitable for use on a small to medium network of friendly developers. It's certainly not suitable for use on a machine connected to the Internet or a large (e.g. university campus) network without firewalling in place.
inetd or tcpwrappers can be used to impose access control rules, but this should be done with an eye to the possibility of address spoofing.
In summary, the security level is similar to that of old-style network protocols like X11-over-TCP, NFS or RSH.
time is a funny concept
Once upon a time, Symantec had a C++ compiler, and with version 7.5 (1996), the build process could be spread all over a network. This did speed up compilation times as machines that were running the build service that were more or less idle would be sent files to compile, passing back the objects and binaries as oppropriate.
Oh, by the way, that compiler is now called Digital Mars C++.
That said, all the machines on the network had to be running Windows (and at that time, I think only Windows 95 or NT were the only choices available for that compiler). Further, all had to have the same version of the compiler.
For those of us that are running Linux boxes on a primarily Windows network, this system, whether GCC or something else, would be rather hard to implement without a cross-compiler. Additionally, even if all were Linux workstations (or BSD, or Solaris, etc etc etc) wouldn't binary compatibility be driven by not just the version of the compiler but the target OS as well?
It's a noble undertaking. I hope that the developers are putting thought into all the little things like this that will make it tough to pull off.
Be excellent to each other. And... PARTY ON, DUDES!
Imagine a Beowulf cluster of these things!
1. Invent front end for open source compiler
2. ??? 3. Profit!
gcc is not a fast C compiler. It is a portable C compiler and it makes pretty good code, but it is not fast. g++ is a slow C++ compiler.
The Codewarrior compilers were fast compilers that made pretty good code.
Its all in the tradeoffs.
You can't have it all. Apple has been adding engineering hours to improve speed with gcc, but gcc will always value source language selection and target portability above all else.
Emphasis is mine.
I guess it all depends on whether or not you want to work with production quality code or not.
using distcc and ccache
from the above link:
distcc & ccache
Has anybody yet thought of integrating distcc & ccache?
Yes, of course. They work pretty well as separate programs that call each other: you can just write CC='ccache distcc gcc'. (libtool sometimes has trouble with that, but that's a problem that we will fix shortly and it is a separate question.)
This is very nearly as efficient as making them both part of a single program. The preprocessor will only run once and the preprocessed source will be passed from ccache straight to distcc. Having them separate allows them to be tested, debugged and released separately.
Unless there's a really strong argument to do otherwise, having two smaller programs is better practice, or at least more to my taste, than making one monolithic one.
I just tried this out to as a test on my two machines (nothing fancy, duron 1.0 ghz and a pIII 450).
I compiled xine-lib-0.9.13 - took 8 mins 13 seconds on just the duron system, and 6 mins 28 seconds using distcc.
I think on monday I may try this out in the lab I have access to at my university - 41 PIII 800s.
I just wish I knew about this when I installed gentoo on my duron - even the ~25% increase in speed would have been really nice .
I'd love a speedup, but the time I'd save compiling would be wasted on having to fully install another linux box. Being able to boot a CD with a live linux distro and this software, and then connect to these slave machines to help compile would be immensely helpful. My linux box is a Cyrix 200MHz PC. Being able to stick a CD into my Athlon 1800 to help the compile would be fabulous.
There is an alternative ( http://gecc.sf.net). gecc has a little different approach, it has a central component that distributes the compilation to a number of compile nodes. The set of compile nodes may change (over time). That is: compile nodes may come and go.
gess is work in progress, distcc is much more mature, but maybe you like to take a look at gecc also.
(yes, gecc is my baby)
Compilation is inherently memory-latency-bound. That is, making your RAM faster is the only way to decrease compilation times. Faster CPUs (or more CPUs) just decrease the time it takes to get to the saturation point.
.c file somewhere and tell the machine to compile it...you need all the headers and libraries as well. For sufficiently complex programs, you'd end up sending a complete copy of the source to each compilation machine, at which point you no longer have a distributed system.
Compiling across n machines will make it faster, but not by much. They list around 50-60% reductions in compilation times, and even admit that it doesn't scale past about n=3.
This is because compilation is inherently combinatorial. Each piece you compile depends on others, so you can't just upload a
10? No, it's about 9.2. Therefore, the speedup is LESS THAN N. Note that the faster the actual compile time, the lower the speedup would be!
What about a project with
Will I retire or break 10K?
Amdahl's law still operates, and you even say so yourself. There's a constant part that cannot be removed. Let's say it takes 50 msec to initialize gcc and 500 msec to compile the average source file. Then it takes 5.05 sec to compile ten files with one copy of gcc.
Then you go on to tell how using ten machines provides only a 9.2-fold speedup. But what about a project with 100 files? It would take 50.05 seconds to build everything on one machine, and it takes 5.050 seconds to build ten files on each machine. Now we have a 50.05/5.050 == 9.92 fold speedup. In practice, can you notice the difference between 9-fold and 10-fold speedup?
Does the speedup factor not approach the number of machines asymptotically?
(How can I "Use the Preview Button!" when an accidental Enter keypress in the Subject invokes the Submit button? Scoop gets it right by setting Preview as the default button.)
Will I retire or break 10K?
you could combine the distribution with object file caching. gecc (http://gecc.sf.net ) shows this. It is work in progress but you could take a look at it if you are interessted.
Ever heard of compile time dependencies? Ususally a file depends on other files being compiled first.
But not every single file depends on all the previous files.
so there is no room for paralelism.
Most projects with heavy compile-time dependencies have several stages. I'll take the example of the Allegro library: 1. preparing include files, 2. compiling the .o files, 3. creating the library, 4. compiling and linking the driver programs, 5. make install. All the substeps in each major step can usually be done in parallel because they depend only on previous major steps.
Will I retire or break 10K?
I followed the 30second install overview, and although it took 20minutes to get working, that 20minutes included a distributed kernel compile across 2 of my systems here: a p3-500 and p3-700, 512meg ram each. very nice. i like it!
Run a gcc cross compiler on the remote machine that will make Linux object files. distcc is operating system agnostic. For distcc's purposes you don't even need the target operating system's header files or runtime libraries since the files are already pre-processed on the host machine, and only compilation of .o files takes place remotely.
Here, for example, is how to build a Linux cross compiler hosted on Cygwin.
even working on dual-processor machines and using 'make -j4' to allow multiple jobs, the nature of how makefiles calculate dependencies can make it very difficult to get much work done in parallel. other 'make'-like systems, such as
Jam/MR do a better job of evenly building large, multi-directory projects.
This was on Sweetcode months ago.
My most common problem these days is *not* compile time, but *link* time. Would be nice if there was a way to speed that up somehow.
-Bill
SlashSig Karma: Excellent (mostly affected by moderatio
Wouldn't be necessary if GCC weren't such a slow bloated pig of a compiler.
"Skill shows through where genius wears thin." -Wittgenstein || Religion: uniting aviation and architecture.
This does the same and more:
http://openmosix.sourceforge.net/
Like one of the posters below, I'm pretty annoyed that there isn't much support for faster compile options in Project Builder yet (even parallel compiles are pretty buggy). We tried using an earlier version of distcc on OSX without much luck but it's been a couple of months so we'll have to try it again. Certainly after having the luxury of Incredibuild for our PC compiles, we're looking forward to a Mac solution for distributed compiles.
Okay.. I have to trot out my old record of compiling GNU emacs 19.34b.
In 1997, I built that version on an SGI Origin 2000 with 8 195 Mhz R10000 CPUs and SGI's parallel make. No modifications to makefiles, etc were necessary.
That machine could go from a 'make clean' to a fully built and dumped emacs with X11 in 9.2 seconds.
It was just awesome. I'd love to see that record crushed by faster SGI's or other hardware. Anyone?
How ironic, I got an ad for Visual Studio .Net when I opened up this page.
The sun beams down on a brand new day, No more welfare tax to pay, Unsightly slums gone up in flashing light...
I agree that gcc has different goals, but comeon, the speed difference is really flabbergasting ! codewarrior is twice as fast ! I can understand a speed loss due to portability, but that much ??
Anyways, it doesn't all matter that much if there were to be a speed gain from dual procs or distributed compiling. So to restate the Q : do you have any idea if PB is faster on a dual1GHz than on a signle 1GHz ?
When will I end this grieving ? When will my future begin ?
There's also compilercache . The homepage details some of the benefits of of using this to supplement make when, for example, changing compiler flags.
Here is a list: :-)
... right?
1157 bugs found
Yeah I know, creating new features is much more fun
And we don't sell the gcc
If you have headers included everywhere that have to be changed often, you've got a bad design.
You'd be amazed at how often it is easy to get rid of those stupid "include everywhere" headers.
The cake is a pie
If your looking for something on the windows side Bloodshed.net puts out a great free (free as in GNU OSS)IDE that can use GCC as the compiler. The newest version (5.0) is still in beta but I have used it for all of my projects this semester and it hasn't crashed once (All console apps but way more complicated than "hello world" and it hasn't let me down once).
http://www.bloodshed.net
Insert sig here (slashdot) Insert cig here (Lewinsky)
It took me several hours to correctly build a gcc-3.2 Windows (cygwin) hosted Linux targetted compiler (see instructions in parent) and to build distcc on cygwin - but the thing actually works. I can launch a `make' on linux and distcc actually performs the compiles via cross compiler on Windows.
The Windows box is several times faster than the Linux box in the test. I have distccd (the distcc daemon) running on the Windows box. The linux cross compiler on the Windows box is named 'i686-pc-linux-gnu-g++' so it is not confused with my native Windows Cygwin compiler.
Timing for a completely local build on Linux:
(linux) $ time make CCC=g++
g++ -c FileA.cpp
g++ -c FileB.cpp
g++ -o Program FileA.o FileB.o
7.40user 0.25system 0:07.66elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (3097major+7464minor)pagefaults 0swaps
And now a timing with distcc farming compiles to the remote Windows box:
(linux) $ export DISTCC_HOSTS='windowsbox'
(linux) $ time make -j2 CCC='distcc i686-pc-linux-gnu-g++'
distcc i686-pc-linux-gnu-g++ -c FileA.cpp
distcc i686-pc-linux-gnu-g++ -c FileB.cpp
distcc i686-pc-linux-gnu-g++ -o Program FileA.o FileB.o
1.09user 0.39system 0:03.44elapsed 43%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (2059major+1207minor)pagefaults 0swaps
I know my test file sizes are a joke, but it shows a distcc distributed build time of 3.44 seconds versus a regular (non-distributed) build time of 7.66 seconds.
Twice as fast as a local build.
So put those idle Windows to work!
Will it work with gcc 2.95.3?
"With Microsoft, you get Windows. With Linux, you get the full house" - unknown
Later, we upgraded to Motorola 88k. The new system came with a GreenHills compiler - which was so buggy it was unusable. The bugs were acknowleged, but never fixed. It was too buggy to compile gcc - I wasted many days simplifying expressions by hand to no avail.
My solution was to build a cross compiler on the 68k, and run it with a stub on the 88k for cc1 that fed the preprocessed source to the compiler on the 68k, and got back the assembler source. (There was no 88k support yet in gcc, so I had to write my own machine model. It was incompatible with Greenhills in passing and returning structs.) The preprocessor, assembler, and linker would run on the 88k - only the actual compiler pass ran on the 68k. This worked beautifully to build gcc on the 88k! And our 33Mhz 88k was so much faster, that I built a 68k cross compiler for the 88k and speeded up compiles on the 68k a great deal.
Next we upgraded to Power PC. AIX comes with no compiler at all! (IBM's compiler is very good - but expensive.) Fortunately, it comes with an assembler and linker. By dint of copying headers from AIX, and hand preprocessing and compiling to assembler, I got the preprocessor running on AIX. Then it was simple to run the cc1 stub again to compile up a native gcc for AIX. The new PowerPC was so much faster than the old 88k, that a cross compiler to speed up 88k compiles was in order also. (I contributed AIX shared lib support for gcc.)
Recently, I fired up a PowerPC cross compiler on our 600Mhz PII running Linux to speed up compiles on our old 100Mhz PowerPC AIX using the same simple technique. By using the -pipe option to gcc, the compiler on Linux runs in parallel with the preprocessor and assembler on AIX - truly efficient.
In conclusion, I want to thank Richard Stallman and the FSF for making it possible to rise above the stupid problems caused by closed source. Before the 68k, we had SCO Xenix. Basic utilities like tar and cpio were broken, reported, and never fixed - but GNU software was there with solid and reliable replacements. (Yes, I made donations and even ordered a tape.)
I spent a couple of weeks tinkering with the idea a while back and wrote a server script in Perl that can run on any Win32 box (no compilers or anything necessary to be installed), and wrote a client script that plugs into MSVC++'s IDE. The client parses the build commands and contacts all servers through a UDP broadcast, then connects to each server, preprocesses a file at a time, transfers the processed C/C++ file and the compiler and its related binaries to the server, then sends the build commands to the server as well. All output is captured on the server and sent back to the client.
.cpp files together and only compiled that one file every time, so their build was a fixed (short) cost, and never hit any header twice. Dirty code that has module-local statics would choke on that technique, but for good code, it's prolly smarter than distributing it.
It worked pretty well, except I had a lot of problems getting fork to work right in Perl on my Win32 box without crashing, so I could get good parallelism. It even fell back to using the local machine in the event of a failure remotely at any point, so on a multiprocessor development machine and no servers to connect to, it would actually use all the processors to compile--something MSVC doesn't do normally.
The nice thing about the server script is it only has three functions: accept a file, send a requested file, and execute an arbitrary command. So it doesn't really care about what it's doing. At work, we're planning to use it to leech some cycles off the receptionist machines and managers' boxen (we all know they don't do anything all day but email anyway!).
The real limiting factor is that preprocessing is relatively slow due to seeks on the hard drive, where not all the headers fit in the disk cache at once. This is a bigger bottleneck than you'd believe.
For what it's worth, I've learned of another company that concatenated all their
Any connection between your reality and mine is purely coincidental.
I just built something much larger: the Parrot (perl6) VM: ;-)
266 MHz Linux box without distcc: 6 minutes, 45 seconds.
266 MHz Linux box + distcc + 2.0 GHz Windows box (with linux cross compiler): 1 minute, 52 seconds.
Who says Windows boxes are good for nothing?
Would it not be simple to just patch make for options to run multiple commands in parellel, which would be automatically sped up by clusters?
"Oh, I have to build a dozen other targets for this target and none of them depend on each other. I'll run them all together!"
It could even be faster on a single system, depending on certain variables. Ah, well, I'll get to work on it later this week.
Question
http://www.ironfroggy.com/
Interestingly, on the gcc mailing list the Apple coders who are working on compilation speed issues are realistically talking about achieving up to a 36x speed-up with a few months' work: 6x by using precompiled headers and another 6x by improving some of the most unsophisticated/unsuitable/stupid algorithms used in gcc.
If you're compiling over the net and can't live with NFS (or samba) look at AFS, it's nicely secure, and is designed to work over the Internet.
/* FUCK - The F-word is here so that you can grep for it */
A distributed front end for compilation/gcc is nice. However, this idea has been implemented before, and in a more general fashion.
:-)
PPMAKE work as a distributed front end for GNU make. This works brilliantly (I used it as a part of my MSc thesis), and distributes arbitrary compilation work in arbitrary languages to a preconfigured list of hosts.
I remember using a cluster of approximately 30 hosts, this worked good. But required a fast master.
This is available as a package for at least Suse.
Requires PVM to be installed, though. But this is usually no problem.
Not entirely true. Some corner cases regarding alignment of bitfields or so have changed between 3.0 and 3.2. Most people don't run into such problems, but the kernel does.