Slashdot Mirror


Reduce C/C++ Compile Time With distcc

An anonymous reader writes "Some people prefer the convenience of pre-compiled binaries in the form of RPMs or other such installer methods. But this can be a false economy, especially with programs that are used frequently: precompiled binaries will never run as quickly as those compiled with the right optimizations for your own machine. If you use a distributed compiler, you get the best of both worlds: fast compile and faster apps. This article shows you the benifits of using distcc, a distributed C compiler based on gcc, that gives you significant productivity gains."

24 of 292 comments (clear)

  1. DistCC is good, but there's some info missing by PacketCollision · · Score: 5, Informative

    While distCC is a great tool, there are a couple things to mention. First, the article blurb states that distCC is "a distributed compiler based on GCC." It is actually a method of passing files to GCC on a remote computer in such a way that the build scripts think it was done locally.

    The article also says that other than distCC, the computers need not have anything in common; this is not strictly true. Different major versions of GCC can cause problems if you are trying to compile with optimization flags that are only on the newer version. I have run into this on my gentoo box, trying to use an outdated version of GCC on a redhat box.

    Another thing is that some very large packages have trouble with distributed building of any sort (either multiple threads on the same machine, or over a network like with distCC). As far as I know, at least parts of xfree86, KDE and the kernel turn off distributed compiling during the build. Some of this might just be in the gentoo ebuilds, but I tink some of it is in the actual Makefiles. If a program has trouble compiling, it's always worth a shot to turn off distCC.

    A good resource for setting up distCC on a gentoo system (since compiling is so large of gentoo, this is particularly important) is gentoo.org's own distCC guide

    1. Re:DistCC is good, but there's some info missing by PacketCollision · · Score: 2, Informative

      Um...Can I mod myself down -1 Idiot?

      The article clearly states that you have to have the same version of GCC. For some reason I read it as distCC, hence the comment about different versions of GCC causing problems.

      I stand by my other points

    2. Re:DistCC is good, but there's some info missing by XO · · Score: 4, Informative

      Actually, you do NOT need to have the exact same version of GCC. However, there are certain points where the compile options given to one version will fail when given to another version. I used DISTCC quite happily between my two RedHat 7.2 boxes and my Debian box, until Debian upgraded beyond I think it was GCC 3.2.0 or so.. with the RedHat boxes having been on 2.96 i think.

      --
      "Champagne for my real friends - and real pain for my sham friends!" http://ericblade.postalboard.com/
    3. Re:DistCC is good, but there's some info missing by meowsqueak · · Score: 2, Informative

      Even if the major versions are the same, Gentoo applies patches to gcc 3.3.3 that are not present in Debian's 3.3.4 (the major version is 3.3). For example, Debian's gcc doesn't recognise -fno-stack-protector which Gentoo's does, and distcc fails.

    4. Re:DistCC is good, but there's some info missing by meowsqueak · · Score: 4, Informative

      It's not a requirement, it's a recommendation. Mixing gcc versions will give you unpredictable results, some of which are perfectly acceptable, and some of which are fatal.

    5. Re:DistCC is good, but there's some info missing by sonicattack · · Score: 4, Informative

      And I've compiled the Linux kernel with distcc splitting the load over two different architectures - one Intel box running Linux, and one UltraSPARC running Solaris (with a cross-compiling GCC).

      No problems. That I noticed. Wouldn't compile a production system kernel this way, though. :)

    6. Re:DistCC is good, but there's some info missing by jaclu · · Score: 2, Informative

      Well I'm using debian unstable on my laptops and workstations, and have been compiling my kernels with CONCURRENCY_LEVEL=10 for the last half year or so with no problems.
      when I look in distccmon, I clearly see paralell compiles, also the compile times clearly shows that its much faster than a singlle compile would be.

      I never tried compiling KDE / Gnome / X with distcc so I wouldnt know about them.

      What I have noticed is that on FreeBsd all the ports seem to fail when compiling with -j > 1

    7. Re:DistCC is good, but there's some info missing by Anonymous Coward · · Score: 1, Informative

      No problems. That I noticed. Wouldn't compile a production system kernel this way, though. :)

      Why not? Last I checked gcc doesn't generate any polymorphic code based on the platform you are compiling on.

  2. I'm not sure how this is news by Anonymous Coward · · Score: 5, Informative

    It's also been been discussed here on Slashdot (two years ago!) in "A Distributed Front-end for GCC" and earlier this year in "Optimizing distcc."

    Distcc is great for installing Gentoo on an older computer because you can have other (faster) computers help with the compile, and if you like distcc, you may also like ccache.

  3. Really does help by Mazaev2 · · Score: 2, Informative

    I just installed a second Gentoo box, a lowly Pentium 2 400mhz with 128mb RAM and read up on distcc. It certainly makes compiling Gentoo a LOT faster when the otherwise poor underpowered box can ask my AthlonXP 2400 for some compiling help.

    It also requires rather minimal configuratio on my part and for the most part "Just Works[tm]"

    Hehe.. now if only I had a beowolf cluster....

  4. distcc is only workaround to an unsolved problem by Rolman · · Score: 2, Informative

    The problem with compiling your own binaries is that you are effectively forking code from the original distribution at the low level. To do this you really must know what you're doing, and that can be a very difficult thing when working with applications you didn't write yourself.

    Just look at the Linux Kernel Mailing List and how many errors can be traced to a GCC specific version. That's why Linus enforces a standardized compiler environment, developers can't be wasting their time fixing compiler-induced errors.

    I know it's attractive to just recompile your whole distribution to your specific hardware combination because there are real world performance gains, but sometimes there are weird bugs caused by it and you'll probably be out of luck trying to find some documentation on them. What are the chances of somebody having the same hardware configuration? And remember we're not talking about branded components and specific models, we must throw in firmware, drivers, BIOS settings and whatnot into the mix.

    As long as the PC components are not standardized, this problem is never going to be away. I seriously considered Mandrake and Gentoo a couple of times in the past and they had very different bugs on each version every time I tried them. Even though they have gotten better on each release, I'd still refuse to put them on a production machine, there's a reason why every distro ships with a precompiled i386 kernel.

    I, for one, just recompile the most important parts of a system that do require most of the CPU time, like the kernel, Apache, and other runtime libraries whenever I do need that extra punch, not a second before. distcc is a geek tool and has that coolness-factor and all, but I'm not on a frenzy to use it to recompile all my servers' software, I care about stability first.

    --
    - Otaku no naka no otaku, otaking da!!!
  5. Re:Any advice on flags for K6-2 CPUs? by EvilTwinSkippy · · Score: 2, Informative
    Fagetaboutit

    O2 is your best all-around setting. Os does make smaller code, but the stuff it outputs is slower. It also causes weird problems with certain apps. It could be useful to condense the memory footprint of properly designed code (like GlibC.) But remember, decreased memory footprint=more hoops the computer has to go through. Think of it like employing fold out-tables. Sure it saves space, but you spend time folding it and unfolding it.

    O3 is a waste of time, except for certain scientific computing apps, or apps where you don't mind blowing out your memory for the sake of speed (i.e. games).

    --
    "Learning is not compulsory... neither is survival."
    --Dr.W.Edwards Deming
  6. I'm content with .src.rpm, thank you by 21mhz · · Score: 2, Informative

    While I haven't RTFA yet, I find the premise stated in the posting somewhat far-fetched. If I need binaries tuned finer than those provided by binary .rpm's, I can take their respective .src.rpm's and rebuild them. The RPM build system, in the distributions I know, provides a convenient way to override optimization flags via system- or user-settable macros. As for compilation time, it's not an issue for most packages these days, as many Gentoo users here can testify.

    --
    My exception safety is -fno-exceptions.
  7. Re:nc: a better tool for distributed builds by XO · · Score: 5, Informative

    it appears that NC requires both computers having access to the same filesystem.. whereas with DISTCC it doesn't... loooks to be a fairly sizeable difference

    --
    "Champagne for my real friends - and real pain for my sham friends!" http://ericblade.postalboard.com/
  8. Re:Gentoo has that covered by Helamonster · · Score: 3, Informative

    Just last weekend I set up distcc via cygwin on 3 PCs to help my Gentoo box compile. Unfortunately, I wasn't able to successfully compile the cross compiler under cygwin, so I used a pre-built version, available under the Gentoo forums thread linked below. It seems to work well so far, although the Windows boxes are definitely slower than equivalent Linux boxes. But as they are not my computers to begin with, I won't be complaining anytime soon ;)

    Gentoo has a HOWTO entitled:
    "HOWTO: Use a Windows box as a distcc server for linux."

    http://forums.gentoo.org/viewtopic.php?t=66930

  9. My experiences with distcc by meowsqueak · · Score: 5, Informative

    I've spent the last week setting up a Gentoo cluster with distcc and I've noticed a few things:

    1. when *recompiling*, the advantage due to ccache far outweighs the performance of distcc on the first compile. If you're testing distcc you need to be aware of this and disable ccache.

    2. most large packages either disable distcc (e.g. xfree by limiting make -jX) or compile small sets of files in bursts and spend the majority of time performing non-compilation and linking. Distcc helps with the compilation but because it's only a small part of the total build time, the overall improvement isn't as great as you might have hoped.

    3. distccmon-gnome is very cool.

    4. using distcc with Gentoo transparently involves modifying your path and this can make non-root compilations troublesome (permissions on distcc lock files). I haven't figured this one out yet other than to specify the full path to the compiler: make CC=/usr/bin/gcc rather than CC=gcc.

    5. the returns from adding an extra distcc server to the pool drop considerably after the first few machines. Even on a 1 gigabit LAN the costs of distcc catch up with the benefits after a while. This is more of a concern when compiling lots of small files.

    6. it can handle cross-compilation with a bit of configuration.

    So although distcc can often reduce build time, it's not quite as effective as you might assume or hope at first.

  10. Try ccache by AT · · Score: 2, Informative

    While it is slightly different in concept, check out ccache. It only uses a single computer but it can significantly speed up your compiles. It works by caching the results of each compilation; it will only help if you compile the same code over and over.

  11. distcc causes kernel panic... by kidlinux · · Score: 4, Informative

    At the moment there's a bug in Linux kernel 2.4.26 that causes the remote compiling systems to encounter a kernel panic (and crash.)

    It's a known bug and has been discussed on the lkml. The bug is also discussed on the gentoo bugzilla. A patch is also available, though the patch program didn't work for me so I had to apply it manually.

    The patch seems to be holding up, too. If you're using distcc on systems with vanilla 2.4.26 kernels, I'd suggest patching them.

    --
    -kidlinux.
  12. Incredibuild by kingos · · Score: 5, Informative

    For those using Visual Studio on Windows, I highly recommend a tool called Incredibuild to do the same job. It is not free like distcc, but is very effective and integrates nicely with Visual Studio. It cut my build time for a project at work from 15 minutes to 1 minute 20 seconds. Nice!

    kingos

  13. And by the way, GCC 3.4.1 by eddy · · Score: 4, Informative

    ... was just released.

    Only available on mirrors, currently.

    --
    Belief is the currency of delusion.
  14. Also of Note. by temojen · · Score: 2, Informative

    Many programs that can use the SSE or MMX extensions (such as video codecs or Software OpenGL) will fail to compile with DistCC, reporting "Couldn't find a register of class BREG".

    If the failing package is one of many, you can emerge just the one package by changing make.conf FEATURES to not include distcc.

    FEATURES="ccache" emerge xine-libs
    emerge whateveryouwerebefore
    Note here I've used the ccache feature. It's really handy because it won't re-compile any parts that've already compiled sucessfully. When make quits on an error, emerge cleans out the sandbox so you can't just let make take care of the already compiled objects.
  15. Sadly, you can't put saved time in the bank. by anti-NAT · · Score: 2, Informative

    While I've come across your argument a lot, regarding lots of little amounts of saved time, I struggle to see any actual value in these "micro" time savings.

    How do you collect up all these micro time savings, making up an "new" hour, in which you can usefully do something else ?

    The only application you've mentioned where I think these optimisations matter is DVD encoding. More broadly, useful time savings can be gained as these jobs tend to take a lot of time eg. I would consider saving 5 minutes on an hours video processing to be worth it. 5 seconds saving per hour wouldn't be.

    I don't see how reducing an SSH connection set up time by 0.25 of a second is going to make me dramatically more productive, even if I do it 1000 times a day. 1000 * 0.25 = 250 seconds, or just over four minutes a day saving. I spend more time in the toilet than that, maybe I should start taking my computer to the toilet with me ?

    My point is this - you need to do a cost benefit analysis, to determine whether the actual time saving has any useful value. In a lot of cases, on the typical, relatively high-performance machine, across the board CPU optimisations don't have any useful value.

    I think throwing away your TV set would be far more productive than compiling everything from source in most cases.

    --
    The Internet's nature is peer to peer - 20050301_cs_profs.pdf
  16. Re:Less and less necessary in the future by Anonymous Coward · · Score: 1, Informative

    Yes there is lots Mozilla can do that Netscape 3 couldn't. CSS, proper rendering (gasp), the myriad of tags that have been added to HTML since those days, lots of new Javascript abilities (did NS3 even have Javascript?), and so on. It might not fully excuse the bloat, but its not as bad as you make it out to be. Things HAVE gotten more advanced.

  17. Close, but still wrong... by sultanoslack · · Score: 2, Informative
    The article also says that other than distCC, the computers need not have anything in common; this is not strictly true. Different major versions of GCC can cause problems if you are trying to compile with optimization flags that are only on the newer version.

    Heh. No. GCC pretty regularly breaks binary compatibility for the C++ ABI. Breaks were at GCC 2.95 -> 2.96 (though 2.96 was just a RH/Mandrake thing), 2.x -> 3.0, 3.1 -> 3.2 and 3.3 -> 3.4.

    You can't mix C++ compiled with any of the compilers that are on opposite sides of those splits.

    Another thing is that some very large packages have trouble with distributed building of any sort (either multiple threads on the same machine, or over a network like with distCC). As far as I know, at least parts of [...] KDE and the kernel.

    Wrong again. Both KDE and the Linux kernel build fine with distributed compilation systems. In fact, KDE has been used with distcc and TeamBuilder for years (TB has been used at the last several KDE meetings) and now there's even IceCream (developed by KDE folks) in KDE's CVS which is sort of based on distcc, but has a central scheduler and does better automatic configuration. It also gets around the first issue above because it's able to build a basic runtime environment based on your system tools (compiler, glibc, etc.) and ship it over to the host machines to build in a chroot environment.

    The kernel also works just fine in a distributed build environment; I build it regularly with such.