A Distributed Front-end for GCC

← Back to Stories (view on slashdot.org)

A Distributed Front-end for GCC

Posted by CowboyNeal on Saturday October 12, 2002 @03:35AM from the when-one-compiler-is-not-enough dept.

format writes "distcc is a distributed front-end for GCC, meaning you can compile that big project across n number of machines and get it done almost n times as fast. The machines don't have to be identical or be running the exact same GCC version, but having the same OS is helpful." With the advent of faster hardware, I can't complain about kernel compile times anymore, but larger source trees could definitely benefit from this.

13 of 195 comments (clear)

Min score:

Reason:

Sort:

Don't need the same version? by Anonymous Coward · 2002-10-12 03:38 · Score: 1, Interesting

That doesn't make too much sense. What if I had 50% 2.9 machines and 25% 3.2 machines, and a bunch mixed in-between? How would it know which version I wanted my program compiled with?
Interesting approach by PineGreen · 2002-10-12 03:39 · Score: 5, Interesting

The sun compiler suite comes with dmake, which does the same on the level of make, rather than cc, but is essentially the same.
Definitelly would make beowulf clusters interesting for compilation as well as hard core numerics (no joke intendend).
So, is it better? by FreeLinux · 2002-10-12 03:58 · Score: 5, Interesting

Is this better than say, Group Compiler?
Differece between distributed/parallel make? by angel'o'sphere · 2002-10-12 03:59 · Score: 4, Interesting

Could someone please point out the difference between a parallel and/or distributed make, like pmake?

It sounds not realy reasonable to put the coding work into gcc when you like to have yacc/bison and a bunch of perl scripts and what ever else you have in your makefile also speeded up.

Regards,
angel'o'sphere

--
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Big benifit by LoudMusic · 2002-10-12 04:07 · Score: 5, Interesting

I think the biggest plus is that you can have one hella fast machine on your network running distcc that basically does all your compiling for all your other machines. I can see this being a big bonus for server farms like rackspace.com. The customers would be getting compile speeds from a big ass server, rather than just their little dinky Duron.

~LoudMusic

--
No sig for you. YOU GET NO SIG!
No, NOT N by HisMother · 2002-10-12 04:21 · Score: 3, Interesting

Ahem. Amdahl's law still operates, and you even say so yourself. There's a constant part that cannot be removed. Let's say it takes 50 msec to initialize gcc and 500 msec to compile the average source file. Then it takes 5.05 sec to compile ten files with one copy of gcc. Ignoring commiunications, it takes 0.550 seconds to compile them on ten machines. Is 5.05/0.550 == 10? No, it's about 9.2. Therefore, the speedup is LESS THAN N. Note that the faster the actual compile time, the lower the speedup would be!

--
Cantankerous old coot since 1957.
Is this thing really usefull ? by nomenquis · 2002-10-12 04:33 · Score: 2, Interesting

i've recently written some dist compile tool using a different approach after i've been using distcc for half an hour ... the big problem with distcc is that it does all the preprocessing on one machine, which is really an overkill in some situations (and limits the total speed increase one can gain). what i've done is first running a modified version of make and then distribute all the objects which have to be compiled to the machines. everything is done on these machines (including preprocessing) which only limits the number of compile machines by the speed of the network (i've been compiling on 60 hosts, with almost linear speed increase). the only problem involved with this approach is that the same compiler has to be installed on these machines, and that they have to write on some sort of network shared filesystem (for the objects) the same compiler is an easy thing since i've been using the intel compiler version 6 and common system includes (i've put them into a shared include dir) any thoughts about this thing ? (or some folks willing to help me create a version basing on gpl stuff so i could release this one ?)
Best of both worlds: distcc with ccache by truth_revealed · 2002-10-12 05:11 · Score: 2, Interesting

using distcc and ccache

from the above link:

distcc & ccache Has anybody yet thought of integrating distcc & ccache? Yes, of course. They work pretty well as separate programs that call each other: you can just write CC='ccache distcc gcc'. (libtool sometimes has trouble with that, but that's a problem that we will fix shortly and it is a separate question.) This is very nearly as efficient as making them both part of a single program. The preprocessor will only run once and the preprocessed source will be passed from ccache straight to distcc. Having them separate allows them to be tested, debugged and released separately. Unless there's a really strong argument to do otherwise, having two smaller programs is better practice, or at least more to my taste, than making one monolithic one.
gecc by j.beyer · 2002-10-12 05:31 · Score: 3, Interesting

There is an alternative ( http://gecc.sf.net). gecc has a little different approach, it has a central component that distributes the compilation to a number of compile nodes. The set of compile nodes may change (over time). That is: compile nodes may come and go.

gess is work in progress, distcc is much more mature, but maybe you like to take a look at gecc also.

(yes, gecc is my baby)
oops... take 2 by yerricde · 2002-10-12 05:53 · Score: 3, Interesting

Amdahl's law still operates, and you even say so yourself. There's a constant part that cannot be removed. Let's say it takes 50 msec to initialize gcc and 500 msec to compile the average source file. Then it takes 5.05 sec to compile ten files with one copy of gcc.

Then you go on to tell how using ten machines provides only a 9.2-fold speedup. But what about a project with 100 files? It would take 50.05 seconds to build everything on one machine, and it takes 5.050 seconds to build ten files on each machine. Now we have a 50.05/5.050 == 9.92 fold speedup. In practice, can you notice the difference between 9-fold and 10-fold speedup?

Does the speedup factor not approach the number of machines asymptotically?

(How can I "Use the Preview Button!" when an accidental Enter keypress in the Subject invokes the Submit button? Scoop gets it right by setting Preview as the default button.)

--
Will I retire or break 10K?
distcc is pretty sweet :) by accessdeniednsp · 2002-10-12 06:14 · Score: 2, Interesting

I followed the 30second install overview, and although it took 20minutes to get working, that 20minutes included a distributed kernel compile across 2 of my systems here: a p3-500 and p3-700, 512meg ram each. very nice. i like it!
Re:Only true for C++ by Scott+Wood · 2002-10-12 09:28 · Score: 2, Interesting

Regardless, it is a point of incompatibility. The reason I'm aware of the bug is because of reports I've seen of customers linking our code against code built with 2.96, and experiencing failures as a result. Thus, code exists that is affected by the bug. Whether the "majority" of apps use this feature, or whether the apps you use use it (have you grepped every bit of source code you run? You may find the Linux kernel to be an interesting place to look...), is irrelevant.
And no, the answer is not to remove the alignment, as it speeds up array accesses by making the size of the struct a power of 2.
Good for boot strapping too by CustomDesigned · 2002-10-12 15:52 · Score: 2, Interesting

I did a simpler version of this many years ago. We had a 68K unix system from Motorola. The pcc compiler was really really bad, so we installed gcc. (I've heard a lot of complaints about how gcc optimizer is no good, but it beat the pants off of pcc.)
Later, we upgraded to Motorola 88k. The new system came with a GreenHills compiler - which was so buggy it was unusable. The bugs were acknowleged, but never fixed. It was too buggy to compile gcc - I wasted many days simplifying expressions by hand to no avail.
My solution was to build a cross compiler on the 68k, and run it with a stub on the 88k for cc1 that fed the preprocessed source to the compiler on the 68k, and got back the assembler source. (There was no 88k support yet in gcc, so I had to write my own machine model. It was incompatible with Greenhills in passing and returning structs.) The preprocessor, assembler, and linker would run on the 88k - only the actual compiler pass ran on the 68k. This worked beautifully to build gcc on the 88k! And our 33Mhz 88k was so much faster, that I built a 68k cross compiler for the 88k and speeded up compiles on the 68k a great deal.
Next we upgraded to Power PC. AIX comes with no compiler at all! (IBM's compiler is very good - but expensive.) Fortunately, it comes with an assembler and linker. By dint of copying headers from AIX, and hand preprocessing and compiling to assembler, I got the preprocessor running on AIX. Then it was simple to run the cc1 stub again to compile up a native gcc for AIX. The new PowerPC was so much faster than the old 88k, that a cross compiler to speed up 88k compiles was in order also. (I contributed AIX shared lib support for gcc.)
Recently, I fired up a PowerPC cross compiler on our 600Mhz PII running Linux to speed up compiles on our old 100Mhz PowerPC AIX using the same simple technique. By using the -pipe option to gcc, the compiler on Linux runs in parallel with the preprocessor and assembler on AIX - truly efficient.
In conclusion, I want to thank Richard Stallman and the FSF for making it possible to rise above the stupid problems caused by closed source. Before the 68k, we had SCO Xenix. Basic utilities like tar and cpio were broken, reported, and never fixed - but GNU software was there with solid and reliable replacements. (Yes, I made donations and even ordered a tape.)