Reduce C/C++ Compile Time With distcc
An anonymous reader writes "Some people prefer the convenience of pre-compiled binaries in the form of RPMs or other such installer methods. But this can be a false economy, especially with programs that are used frequently: precompiled binaries will never run as quickly as those compiled with the right optimizations for your own machine. If you use a distributed compiler, you get the best of both worlds: fast compile and faster apps. This article shows you the benifits of using distcc, a distributed C compiler based on gcc, that gives you significant productivity gains."
While distCC is a great tool, there are a couple things to mention. First, the article blurb states that distCC is "a distributed compiler based on GCC." It is actually a method of passing files to GCC on a remote computer in such a way that the build scripts think it was done locally.
The article also says that other than distCC, the computers need not have anything in common; this is not strictly true. Different major versions of GCC can cause problems if you are trying to compile with optimization flags that are only on the newer version. I have run into this on my gentoo box, trying to use an outdated version of GCC on a redhat box.
Another thing is that some very large packages have trouble with distributed building of any sort (either multiple threads on the same machine, or over a network like with distCC). As far as I know, at least parts of xfree86, KDE and the kernel turn off distributed compiling during the build. Some of this might just be in the gentoo ebuilds, but I tink some of it is in the actual Makefiles. If a program has trouble compiling, it's always worth a shot to turn off distCC.
A good resource for setting up distCC on a gentoo system (since compiling is so large of gentoo, this is particularly important) is gentoo.org's own distCC guide
It's also been been discussed here on Slashdot (two years ago!) in "A Distributed Front-end for GCC" and earlier this year in "Optimizing distcc."
Distcc is great for installing Gentoo on an older computer because you can have other (faster) computers help with the compile, and if you like distcc, you may also like ccache.
Nothing to see here.
Your hair look like poop, Bob! - Wanker.
Those of us who use Gentoo have known about, and relied on distcc for quite awhile now.
That's why I use Gentoo!
Now one can install Gentoo in _only_ 5 days!
I think nc can be used like distcc by redefining CC="nc gcc". However, more commonly it is done by putting $(NC) at the beginning of the build rules. Then you can use nc for any build rules, not just C compiles.
In addition to use with make, nc works well with SCons.
Convenience? How about just plain saving time.
I just installed a second Gentoo box, a lowly Pentium 2 400mhz with 128mb RAM and read up on distcc. It certainly makes compiling Gentoo a LOT faster when the otherwise poor underpowered box can ask my AthlonXP 2400 for some compiling help.
It also requires rather minimal configuratio on my part and for the most part "Just Works[tm]"
Hehe.. now if only I had a beowolf cluster....
My family can't afford more than one computer, you insensitive clod!
But seriously, is there a way to make use of the concepts embodied in distcc in a home computing environment? Or is distcc designed for use by for businesses and schools?
Well, they might as well merge sites if they're going to share stories.
Until Slashdot fixes the funny modifier, use insightful or interesting. The poster knows your intentions.
One of the problem I have encounted is that distcc only works reliablely with homogeneous envirnment where each box has same version of gcc on the same platform. I have yet to find a way to make my Gentoo box work with my cygwin box.
Shouldn't you be using another compiler than gcc
if you want faster apps?
As far as I know, gcc is not the fastest, most
optimized compiler that is available?
File under "old news".
"Learning is not compulsory... neither is survival."
--Dr.W.Edwards Deming
Any application I have to compile before using with the 'correct optimizations for my machine' will take more time to get up and running than any 'productivity gains' it might produce. This is why Linux is still not accepted by mainstream computer users. They don't care how it works, just that it does.
Compare the speed cost of loading a "generic" binary to an "optimised" one, multiply by the number of times you load that binary.
Then look at the time required to compile the optimised copy.
How often, in the lifetime of a particular version of a binary, do you really need to reload it?
The promise of distcc is closely related to source distributions like Gentoo. The benefit is overstated. Don't waste your time.
Quick wafting zephyrs vex bold Jim
Personally, I think that distcc will become more and more useless as computers get faster. My new machine (P4 2.8, 1 GB RAM, SATA drives) can compile a complete Gentoo desktop system in just about two hours. That's pretty damn cool considering that it used to take like 24 on my old laptop when I first started using Gentoo several years ago. It would probably only take about an hour to setup a server system on Gentoo on my same machine since the biggest component, X, would not need to be compiled.
Computing power is outstripping the size of source code that needs to be compiled. Soon there will be little difference in install time between the source and binary distros, and all the jokes about Gentoo's compile time will be pretty much obsolete. Already, once you have your system installed, the time required to keep current/install new apps is minimal. My system can compile any new program (except maybe OpenOffice) in under 25 minutes. Even Mozilla can be compiled in that time.
Well, as someone who recompiles FreeBSD/DragonFly quite frequently, I've got to say that the best way to reduce the time it takes is to build eveything in a ramdisk. I've cut 100 minute compile times down to about half an hour by mounting /usr/obj in a ramdisk instead of on my hard drive.
r tid=53
http://bsdvault.net/sections.php?op=viewarticle&a
how long does it take to compile "Hello World"?
We use the distcc that Apple distributes with XCode even though we dont' use XCode itself. It really helps to get a few dual-CPU G5's working!
The cool thing about Apple's version is that by default it uses Rendezvous to determine which machines are available to distribute work to.
Reducing compile time by distributing the load isn't reducing it all, it's just distributing it. Try using a compiler that compiles fast -- such as Plan 9s compilers.
Which makes it a pain in the ass if you ask me.
I have tried to use distcc for a lot of stuff, but it doesn't work on some packages, and that's enough to make me not use it.
I don't want to have to hand-pick which packages to use it with and which ones to not use it with. Fortunately, a lot of Gentoo packages have a rule built in to not use distcc automatically, but it's not always the case.
The other thing about distcc is that it won't increase the speed of the compile by any large magnitude with each machine added because the machine performing the actual compile has to do a lot more work then the slaves.
Unless I was trying to compile something on a REALLY slow machine, I don't bother with distcc.
- It's not the Macs I hate. It's Digg users. -
Precompiled binaries will never run as quickly as those compiled with the right optimizations for your own machine
And there are maybe about ten to fifteen people on all of slashdot who actually know how to go about setting the right optimizations for their own machine.
"(...) a distributed C compiler based on gcc, that gives you significant productivity gains."
Assuming
a) That compiling will give you any significant performance increase (which I kinda doubt, it's not like the defaults are braindead either)
b) You don't spend more time mucking about with distCC / compiling than you'll actually use the software
c) Your software is actually code bound (and not "What do I type/click now?" human bound, or bandwidth bound or whatever)
I can't think of a single thing I do that's code bound. And I actually do a bit of compiling, but I spend those seconds thinking about what to code next. Either that, or it is bandwidth bound or non-time critical (i.e. does it take 6,5 hours or 7 hours? Who cares. The difference is half an hours work for my computer, 0 for me. So the time I'd spend to improve it is - gasp - 0.
Kjella
Live today, because you never know what tomorrow brings
I have 2 machines with K6-2 processors. Currently, I use "-Os" because I have read that reducing the code size can improve perfomance better than "-O3" on machines with small caches.
Does anyone have any advice about this? Are there any objective comparisons that relate to my configurations?
The real "Libtards" are the Libertarians!
It's news to people that don't read slashdot every day.
I don't mind revisiting older topics once in awhile - it's only annoying when it's two days in a row. And even then, it's not that big of a deal, I simply pass over it.
Posts like this are more waste of space then then a duplicate article post, and you get a lot more posts like yours then we do dupes. It's especially annoying when people say "We talked about this TWO YEARS AGO!!!" Well here's some news for you: I don't memorize every slashdot story since the beginning, and there's been a lot of new members since then.
- It's not the Macs I hate. It's Digg users. -
Other then distributed compiler tools like distcc and nc are there any other ways of speeding up a linux compile with gcc?
I was blown away when my project group compiled a Qt app that we developed on the Linux platform with the MS VC++ compiler. The compilation took 1/10th the time! We were using Makefiles generated by QMake in both cases.
Should I just switch compilers? If so does anyone have any suggestions?
Story is copy/pasted straight outta' OSNews.com
http://funroll-loops.org/
The problem with compiling your own binaries is that you are effectively forking code from the original distribution at the low level. To do this you really must know what you're doing, and that can be a very difficult thing when working with applications you didn't write yourself.
Just look at the Linux Kernel Mailing List and how many errors can be traced to a GCC specific version. That's why Linus enforces a standardized compiler environment, developers can't be wasting their time fixing compiler-induced errors.
I know it's attractive to just recompile your whole distribution to your specific hardware combination because there are real world performance gains, but sometimes there are weird bugs caused by it and you'll probably be out of luck trying to find some documentation on them. What are the chances of somebody having the same hardware configuration? And remember we're not talking about branded components and specific models, we must throw in firmware, drivers, BIOS settings and whatnot into the mix.
As long as the PC components are not standardized, this problem is never going to be away. I seriously considered Mandrake and Gentoo a couple of times in the past and they had very different bugs on each version every time I tried them. Even though they have gotten better on each release, I'd still refuse to put them on a production machine, there's a reason why every distro ships with a precompiled i386 kernel.
I, for one, just recompile the most important parts of a system that do require most of the CPU time, like the kernel, Apache, and other runtime libraries whenever I do need that extra punch, not a second before. distcc is a geek tool and has that coolness-factor and all, but I'm not on a frenzy to use it to recompile all my servers' software, I care about stability first.
- Otaku no naka no otaku, otaking da!!!
Wow! My company has been doing distributed compiles for about fifteen years now (with gcc, nonetheless). It's old hat. But along comes some guy telling Gentoo users to use distributed compiles, and suddenly it's the next best thing to sliced bread! This is like the sixth or seventh distcc article I've seen in the last month.
It's really nice that you guys have discovered this, but don't act like it's something new and amazing, or even that it's something unique to Linux.
Don't blame me, I didn't vote for either of them!
Honest question: gcc has the reputation of not producing the fastest code for x86, so why should I bother compiling gentoo with gcc or distcc?
Does anyone know if there are distro's compiled with, say, the Intel compiler?
While I haven't RTFA yet, I find the premise stated in the posting somewhat far-fetched. If I need binaries tuned finer than those provided by binary .rpm's, I can take their respective .src.rpm's and rebuild them. The RPM build system, in the distributions I know, provides a convenient way to override optimization flags via system- or user-settable macros. As for compilation time, it's not an issue for most packages these days, as many Gentoo users here can testify.
My exception safety is -fno-exceptions.
Today, if the boss catches me reading /. I can say "I only do it during long compiles, honest!"
This could ruin EVERYthing!
--
1999, my senior project in college. There were 5 of us working on it. We were sponsored by a company that was a subsidiary of IBM (cannot for the life of me remember their name). Our first iteration used TCP/IP for the connectivity between the "master server" and the clients.
There was no guarantee that your client would be selected for use -- but it could be and was an optional system. It of course used distrubuted file systems for the project. In the end, all the OBJ files were linked together by the machine that initiated the build process.
We had methods to list what compiler you were using (so only machines with that compiler would be used), we had machine "ratings" so that a faster machine would be more used than a slow one, and were planning on monitoring CPU usage and other things to determine on the fly what machines were suitable. Of course, we didn't have time to do everything.
The second iteration we took on it (this was all done over a 20-week period, the first 10 weeks design, the next 10 weeks coding), we used CORBA for the inter-machine communication.
This is also the same project where I stupidy this this.
It's good to see that commercial-scale compiling is now available to the dedicated hobbyist. This and other applcations of distributed computing have the potential to not only put to some useful purpose the generation or two of old machines from previous upgrades, but to inspire the creation of other distributed applications for them.
One CPU cycle wasted on digital restrictions management is ONE TOO MANY.
I use prebuilt binaries for most applications because it saves the hassle of having to keep recompiling whenever a patch comes out - a quick up2date or apt-get every now and then and all is good.
The only things I generally recompile are the Linux kernel itself and Apache/PHP, and then only on the production servers where speed is at all an issue - for desktop use the out-of-the-box binaries run just fine, even on my old sub-1GHz machine!
I'd be interested to see what speed improvements would be found if say a Gentoo system was built using the Intel compiler (with an intel CPU, obviously ;-) instead of GCC. Anyone tried created RPMs or a whole distro using another compiler?
Code, Hardware, stuff like that.
When would it make sense to compile my apps from source and turn on all of the optimization? Would it help much for a desktop user or is it something better suited for someone who has a specific busy task, like a busy web server?
Slashdot: Failed Car Analogies. Amateur Lawyering. Anecdote Battles.
I've got two Gentoo systems that run distccd.
I did some non-scientific testing with distcc.
System one:
Athalon 1400 XP
512 Meg RAM
System two:
Pentium III 450
512 Meg RAM
I compiled GAIM on System one with distccd running on system one and two, also compiled with just distccd running on system one.
I found that with both systems running distccd I got about a two minute faster compile. Then with just distccd running on system one.
With distccd running on just system one I found that it would process many of the individual compiles in parallel ah-la SMP, thing is it's a single processor system. I've not tested the time on system one of distccd vs no distccd. I imagine that with the parallel compiles it works faster. It all depends on what you set your -jN to, where N is the number of "systems" X2 +1. I found that with two systems I could run well with a -j of 7. A bit higher then suggested.
It is correct that many programs that have sensitive builds, XFree and Opera for example, it turns off the -j option. Not a big deal, just means a longer coffee break.
Distccd has come in very handy when I was installing Gentoo on an old Gateway 2100 Solo laptop. The laptop only has a Pentium 120 and 40 Megs of memory.
I'd suggest distcc for anyone who does quite a bit of source builds, a must for a Gentoo install!
--All programmers are playwrights and all computers are lousy actors.
I like the idea -- can you tell it not to use your CPU at all so you can browse the web quickly while the rest of the LAN bogs down compiling your kernel?
The Anti-Gentoo zealots always say stuff like "It's not going to be faster.." etc etc..
:P
Now someone is saying: "precompiled binaries will never run as quickly as those compiled with the right optimizations for your own machine."
So, the Gentoo users that are claiming that stuff compiled with the right optimizations *is* faster?
I'm confused. Which is it supposed to be? Are Gentoo users full of crap, or are they correct?
I use Gentoo and have found things to be a hell of a lot easier to deal with than RPM based binary distros anyway.
I just want the scoop.
Oh, distcc has been in Gentoo for a while.. surprised to see it listed like it's a new thing.
I've spent the last week setting up a Gentoo cluster with distcc and I've noticed a few things:
1. when *recompiling*, the advantage due to ccache far outweighs the performance of distcc on the first compile. If you're testing distcc you need to be aware of this and disable ccache.
2. most large packages either disable distcc (e.g. xfree by limiting make -jX) or compile small sets of files in bursts and spend the majority of time performing non-compilation and linking. Distcc helps with the compilation but because it's only a small part of the total build time, the overall improvement isn't as great as you might have hoped.
3. distccmon-gnome is very cool.
4. using distcc with Gentoo transparently involves modifying your path and this can make non-root compilations troublesome (permissions on distcc lock files). I haven't figured this one out yet other than to specify the full path to the compiler: make CC=/usr/bin/gcc rather than CC=gcc.
5. the returns from adding an extra distcc server to the pool drop considerably after the first few machines. Even on a 1 gigabit LAN the costs of distcc catch up with the benefits after a while. This is more of a concern when compiling lots of small files.
6. it can handle cross-compilation with a bit of configuration.
So although distcc can often reduce build time, it's not quite as effective as you might assume or hope at first.
How do the major distributors - Red Hat, SuSE. ... manage their compiles?
precompiled binaries will never run as quickly as those compiled with the right optimizations for your own machine
A straw man. Precompiled binaries may have been compiled with the optimal settings for your machine, and binaries which you compile may not have the optimal settings. Identifying the optimal settings can actually be non-trivial. Source-based distributions are not necessarily the best fix to the 'one-size-fits-all' approach used by some distro's.
I came across distcc by chance about 4 months ago, and I must say, it has utterly improved things around here.
:-( ).
We reguarly develop/compile/debug a moderate-small sized software package, typically taking about 1 minute per compile. Now, while 1 minute doesn't sound like a long time, it starts adding up when you find yourself recompiling 100+ times a day.
With the inclusion of distcc into the whole situation, we're able to reduce that 1 minute compile down to a little less than 20 seconds; highly appreciated (although now we have less excuses to go get a coffee
Distcc is a great package which can be extremely useful.
PLD.
While it is slightly different in concept, check out ccache. It only uses a single computer but it can significantly speed up your compiles. It works by caching the results of each compilation; it will only help if you compile the same code over and over.
Seriously though -- Microsoft is the one company that can guarantee that their source code sizes will continually outstrip computational power. I wonder what kind of clustering solution they use to get their Windows builds to compile in a reasonable timeframe?
"The problem with compiling your own binaries is that you are effectively forking code from the original distribution at the low level."
Wow. That is blatantly false and misleading. "Forking code" refers to one piece of code diverging into two separate projects, ala X.org and Xfree86. All binaries have to be compiled somewhere...when the guys at Red Hat or Suse do it for you, they generally do it in a very generic, compatible way, and when you do it yourself, you can take some more risk or tailor it to your own needs.
Furthermore, unless you use a different version / environment settings, the result is exactly the same as if someone else did it. There's nothing inherently dangerous or "fork-inducing" about compiling your own source code...only if you use an unstable version of the compiler or pass dangerous optimization flags.
"Just look at the Linux Kernel Mailing List and how many errors can be traced to a GCC specific version."
OK, there are glitches in some versions of GCC, so what? You seem to have a problem with people compiling their own source and distcc, but none of this has anything to do with GCC. The point is, whatever you're compiling, distcc is going to get it done faster. Sure, there may be reasons why people who don't really have a firm grasp on linux may need to avoid using unstable versions of GCC or aggressive compiler options, but that is really unrelated to the idea of a distributed compiling architecture.
But there is another kind of evil that we must fear most... and that is the indifference of good men.
But this can be a false economy...
Every time something that is distributed in binary is rebuilt from source for local use, by definition it's to change some assumption that was inherent in the testing of the original binary (or else the binary distribution would suffice). And with that, some non-0 confidence that was built into the binary release by that testing is wiped out and must be recovered by local analysis and testing (i.e., time and effort) or reduced expectations. Otherwise, it's running on blind faith. This is particularly true with programs that are used frequently, i.e., one expects to depend on them repeatedly. So in my mind, "the best of both worlds" is more meaningful if it refers to fast and reliable apps. I don't care how fast the compiler is if I can't trust the results anymore. That is a different economy equation, and completely justifes the "convenience" of pre-compiled binaries in many applications.
Astounded, think I'm lying, then you were wrong.
Not that it doesn't take me two seconds, but that 'Computing power is outstripping the size of source code that needs to be compiled.'
thank God the internet isn't a human right.
How is that better than JVM or even better CLI ?
It is an attempt to solve a problem already solved (in a much more elegant and portable format) using outdated techniques.
Just mho
i wnated to point out , if you are going to try distcc , run it with unsermake for example to compile kde. try google with unsermake - it make compiling distributed much faster.
best ist : ccache, distcc and unsermake.
Precompiled headers can give more than a 50% performance improvement, though they don't work for a lot of things, but it kinda works with QT
In the real world pre-compiled headers, offset caching and all that kinda crap would be seemless, quick and easy, but in the GCC (portability) world they havn't quite made it yet.
thank God the internet isn't a human right.
... is sitting on a sl5500, compiling something right now with a fleet of 20 x86-based 'blades', on various parts of the vpn, at various parts of the the world which i hope to someday never have to visit to administer!
#/usr/src/c_climber/./configure; make ; make install
How much productivity is lost in setting up the distrubuted compilation network and enviornment when compared to just doing a few "./configure; make install" commands?
Imagine a Beowulf Cluster of these! Oh wait....
piqued my interest.
what does it mean?
My main use for distcc currently is building software for my powerbook.
I do a lot of work with Qt on both Linux and Mac, and lets just say Qt compiles very slow on my powerbook (which is an older 800 mhz G4).
Also, I've had to build all of Qt on this machine because the fink packages are old and don't even use the Mac version (they use the X11 version which really sucks and makes apps on Macs look like crap).
So at work we have a couple dual G5s I use, and also a few Linux machines which I've built darwin cross-compilers for (yes its a pain in the ass).
I remember compiling 2.0.x kernels on my own 100MHz Pentium. It took -forever-.
Later on, I built a 350MHz K6-2 machine for a customer, and it was a screamer compiling its 2.0.x kernel, taking just a few minutes.
Fast forward: I've got a very similar K6-2 350 as a miscellaneous server and firewall here. Compiling its 2.6 kernel takes -forever-.
But the new 2.4GHz HT 800MHz FSB P4 box I built recently for work is again a screamer, compiling its 2.6 kernel in a few minutes. This box is in roughly the same relative performance league as that K6-2 was Back In The Day.
Moral of this story: The more things change, the more they stay the same. Program and compiler complexity has kept pace with increases in processor speed, leaving the time to compile x more or less constant over at least the past few years.
'Sides, even if you can build a proper desktop system in two hours, distcc serves well to decimate the amount of time required. Whether it is used to cut that two-hour-long run down to 30 minutes minutes, or a 5-minute compile down to 60 seconds, distcc will always have its place[1].
[1]: Yes, yes, I know. People everywhere are saying "But who cares if it takes 5 minutes instead of 60 seconds? It's not like I can't continue using the machine while it's compiling." These people are ignoring the human aspect of the whole thing, which can be summarized as follows: Wife, house, kids, cars, jobs = 4 minutes worth of life that has been rescued from the computer by distcc.
Kid-proof tablet..
At the moment there's a bug in Linux kernel 2.4.26 that causes the remote compiling systems to encounter a kernel panic (and crash.)
It's a known bug and has been discussed on the lkml. The bug is also discussed on the gentoo bugzilla. A patch is also available, though the patch program didn't work for me so I had to apply it manually.
The patch seems to be holding up, too. If you're using distcc on systems with vanilla 2.4.26 kernels, I'd suggest patching them.
-kidlinux.
Think about it, is there anything fundamental that Mozilla can do that Netscape 3 couldn't?
mozilla firefox 0.9 can run 3 hours under Linux without a SIGBUS. Communicator 4.79 cannot.
It's so PAINFULLY SLOW to build anything for the Mac, with this inefficient Objective-C compiler and large linking requirements for Carbon, that without these distributed tools and some G5 servers, it would be hard for us to develop.
Interestingly, our Windows version of this product, built in C#, compiles extremely fast with no distributed trickery needed.
I think the "speed advantage" of source based distributions is small to non-existant. The advantage to them is the ability to exclude/include support for various compile time features(if you don't need TLS auth in a ftp deamon why compile it in?) and easier cross platform support.
1. This should go without saying, but I'm going to say it anyway: if you have a file with more than 1000 lines, you probably need to break it up into 2 or 3 files. Likewise, if you have a function with more than 100 lines, you probably need to rewrite it as 2-3 functions. Exceptions can be made for performance critical functions (and then only when you have profiling data to suggesting it is necessary and will make a noticeable difference).
2. Fix your makefile. It sounds like you're compiling "the kitchen sink" on every minor change. If necessary, factor out the code that's forcing you to recompile your entire project. Files generated during compilation should be wrapped with a tight interface (lex/yacc come to mind here). Surely you know better than to modify the equivalent of 'types.h' 100x/day.
3. "include guards" are your friend.4. "external" include guards help too (especially if your compiler doesn't support any form of "#pragma once")Most people do #1 by default. Knowing how to do #2 can make a pretty big difference. The key is to reduce the number of dependencies. However, if those dependencies are unavoidable, then you need #3 and #4. Together they can result in 10-100x speedups in compile time for med-to-large projects (YMMV).
For those using Visual Studio on Windows, I highly recommend a tool called Incredibuild to do the same job. It is not free like distcc, but is very effective and integrates nicely with Visual Studio. It cut my build time for a project at work from 15 minutes to 1 minute 20 seconds. Nice!
kingos
The claim also contains the assumption that applications are CPU-bound. All the recompiling in the world won't make something go faster if it's waiting on a disk or a UART or a NIC. Many applications are fast enough anyway -- who cares if /bin/cat gets a 2% improvement of its CPU use? I bet I could add a 20 microsecond gratuitous delay in the main loop of cat, and not noticably affect its performance!
That said, the kinds of things I would like to have extra-optimized for speed are generally big, huge, complicated things that take forever to compile. Like an Xserver. And that's definitely where distcc could come in handy.
... was just released.
Only available on mirrors, currently.
Belief is the currency of delusion.
I did this 10 years ago using Perl on Sun workstations.
Am I missing something, but why would anyone be reguralrly doing such long compiles? I remember C++ development in the mid-80s when long compiles and links were common, but that was in the days of 20 MHz 386s. Even then, we soon moved on to the use of incremental compilers - only the relevant (and small) parts of even a large project would be compiled, and then linked back into the pre-compiled code. I can understand why the first compilation of a large project could take some time, but from then on, shouldn't compilation and linking be a minor matter?
- Copy the Linux system header files to the Cygwin box.
.o file. Try to get all your boxes to use the same directory structure.
- Build a GCC Linux-target cross compiler hosted on Cygwin.
- Run the distcc daemon on Cygwin making sure that it finds the Linux cross compiler binaries (gcc/g++) in the path first (use a script to launch the daemon).
- Point the Linux box to the Cygwin box via DISTCC_HOSTS and go.
Some notes:
- using -g (debug) distcc builds can lead to bizarre behavior because the source file path is embedded in the
- compiler versions must be IDENTICAL on all boxes if you intend to use C++. Don't trust your system's default g++ compiler - who knows what crazy patches it has - build one from scratch using the same GCC sources on ALL machines. C is more leniant - you can get away with vastly different versions of GCC if you only use C.
Actually precompiled binaries built by someone who knows what they're doing will almost always give a better performing system that those compiled with some random combination of compiler options that some 733t d0d@ mistakenly thinks work magic because a /. article said so.
Most of the bottlenecks on my systems are disk-io related. Compiler options matter fuck all, but gcc -Os rocks! gcc -O99 sucks! :-)
Many programs that can use the SSE or MMX extensions (such as video codecs or Software OpenGL) will fail to compile with DistCC, reporting "Couldn't find a register of class BREG".
If the failing package is one of many, you can emerge just the one package by changing make.conf FEATURES to not include distcc.
Note here I've used the ccache feature. It's really handy because it won't re-compile any parts that've already compiled sucessfully. When make quits on an error, emerge cleans out the sandbox so you can't just let make take care of the already compiled objects.Kinda frustrating to run a huge emerge, only to return hours later to see that nothing ran because distcc puked.
"Avoid employing unlucky people - throw half of the pile of CVs in the bin without reading them." -- David Brent
Cool, so could you find me a binary package for the application I'm developing which hasn't been compiled yet? That would save me a lot of time. :-)
Karma: It's all a bunch of tree-huggin' hippy crap!
While I've come across your argument a lot, regarding lots of little amounts of saved time, I struggle to see any actual value in these "micro" time savings.
How do you collect up all these micro time savings, making up an "new" hour, in which you can usefully do something else ?
The only application you've mentioned where I think these optimisations matter is DVD encoding. More broadly, useful time savings can be gained as these jobs tend to take a lot of time eg. I would consider saving 5 minutes on an hours video processing to be worth it. 5 seconds saving per hour wouldn't be.
I don't see how reducing an SSH connection set up time by 0.25 of a second is going to make me dramatically more productive, even if I do it 1000 times a day. 1000 * 0.25 = 250 seconds, or just over four minutes a day saving. I spend more time in the toilet than that, maybe I should start taking my computer to the toilet with me ?
My point is this - you need to do a cost benefit analysis, to determine whether the actual time saving has any useful value. In a lot of cases, on the typical, relatively high-performance machine, across the board CPU optimisations don't have any useful value.
I think throwing away your TV set would be far more productive than compiling everything from source in most cases.
The Internet's nature is peer to peer - 20050301_cs_profs.pdf
Technically, all software compiled with CPU optimisations will be faster, when compared with software without those optimisations.
However, the better question to ask is whether the applications in question will be "usefully" faster. For example, if I compiled Firefox with CPU optimisations, which made pages load 0.25 of a second faster, is that going to be useful to me, in particular compared to the time it took to compile ? Probably not - the latency on the Internet is going to be far greater and vary enough such that I'll never be able to usefully benefit from the 0.25 second per page performance increase. Even if I did benefit per page, and say loaded 1000 pages a day, I'm saving 250 seconds per day, which is just over 4 minutes per day. As I mentioned in another post, I spend more time that that in the toilet per day. Does a 4 minute per day saving really matter ? To me, I don't think it does.
The Internet's nature is peer to peer - 20050301_cs_profs.pdf
benifits of using distcc
... Apparently one of them is not *spelling*
Actually, I think posts like the grandparent's are useful because that way people who don't read Slashdot everyday or missed the other articles can read them and get up to speed on the discussion/topic. I hope that will lead to fewer redundant comments.
The problem I have is that my compilations are almost always disk bound, and not CPU bound. This is because I do most of my work on a laptop which has a very slow 4200 rpm HDD....
Since distcc still requires the local machine to pre-process all the files and open all the include files (of which there are a great many for my work), I don't really expect that distcc would give a huge performance increase; certainly the gains with ccache have been show to only be around 180 seconds over a normal compile run of 660 seconds...
-- Mike
Anyone knows if precompiled headers ("production quality") are coming to GCC any time soon?
distcc is ok, until you want to add quite a few machines. Problem with distcc, is that it doesn't scale well. This is where Trolltech's teambuilder shines. Free for personal use with up to 3 machines, and available for large scale commercial compiler operations, in a variety of CPU's. Is very easy to setup and has a very cool gui monitoring system.
-- "Perceptions create reality. By changing your perceptions you change your reality."
> But this can be a false economy, especially with
> programs that are used frequently: precompiled
> binaries will never run as quickly as those
> compiled with the right optimizations for your
> own machine.
actually, the obsession with recompiling so-called "optimised" binaries is a false economy. the benefits are minimal while the effort can be considerable (especially for newbies - who are generally the kind of people are impressed by silly ideas like this).
distcc is a useful tool for those who actually need it but for the vast majority of programs used by the vast majority of people, the average 1-5% performance improvement that can be gained by optimising a compile for your exact CPU will never even come close to compensating for the amount of time spent on editing Makefiles and compiling the software. Like everything else, optimisation is subject to the law of diminishing returns.
CPU intensive programs (e.g. 3d rendering and other number crunching apps) can benefit from optimised compilation and may, IFF you use them a lot, be worth recompiling for your system....but in general, it's not worth the bother.
similarly, it may be worth while recompiling heavily used server software (e.g. apache, or postgres) on dedicated servers - but most such services are I/O bound, not CPU bound. Spending hours to shave off a few CPU cycles here and there is generally not worth it....you're better off spending time and money upgrading your disks or installing more RAM.
Someone should meantion tcc, the Tiny C Compiler. And its ultra fast to! .. At compiling that is, the runtime I dont know of =) It can also be invoked from within your own C code. And the new generated code can be dynamicly linked and executed also from within your C code. tcc can be used as an script executor much like perl. If you intressted in C as a programing language you have to try tcc.
/usr/local/bin/tcc -s
#!
int main() {
printf("Im a C Shell Script!\n");
return 0;
}
It's not that I don't care about optimisation when I choose an rpm over a .tar.gz source. It's just that compiling, at the best of times, is not as straightforward as rpm.
Make, frequently fails, so much so that it's frequently faster to compile each file one by one.
Not that rpm doesn't have its own problems. I'm always running across dependencies or bad headers.
Overall, whenever possible, I choose yum over both. It's slow, frequently times out, and downloads whoap loads of headers it never needs, but it finds dependencies, auto installs, and sets up programs on the system tray.
May the Maths Be with you!
surely this is a myth ?
But this can be a false economy, especially with programs that are used frequently: precompiled binaries will never run as quickly as those compiled with the right optimizations for your own machine.
Here comes the pain...
How can I tell which are the right optimizations? I assume precompiled binaries were compiled with gcc -O3. Beyond -O3 everything goes magic.
Instead of spending hours and hours doing profiling and testing I'll just gladly accept that my Firefox renders slashdot.org 1.05% slower.
As a matter of interest, have you tried importing your project into Xcode? I presume that it's a right pain, otherwise perhaps you would have used the distributed compiling built in to the IDE?
:( ) so I don't know if it's of much use for GNU-style builds. ]
[ I've only dabbled with Xcode (I'm a Windows developer by trade
Heh. No. GCC pretty regularly breaks binary compatibility for the C++ ABI. Breaks were at GCC 2.95 -> 2.96 (though 2.96 was just a RH/Mandrake thing), 2.x -> 3.0, 3.1 -> 3.2 and 3.3 -> 3.4.
You can't mix C++ compiled with any of the compilers that are on opposite sides of those splits.
Wrong again. Both KDE and the Linux kernel build fine with distributed compilation systems. In fact, KDE has been used with distcc and TeamBuilder for years (TB has been used at the last several KDE meetings) and now there's even IceCream (developed by KDE folks) in KDE's CVS which is sort of based on distcc, but has a central scheduler and does better automatic configuration. It also gets around the first issue above because it's able to build a basic runtime environment based on your system tools (compiler, glibc, etc.) and ship it over to the host machines to build in a chroot environment.
The kernel also works just fine in a distributed build environment; I build it regularly with such.
err.. mods, this is completely unrelated, not informative. i hope you can read
Isn't icecream ment to be superior to distcc?
The utility that comes installed by default with SUSE called 'cook' does the same thing. How is distcc any better than cook? Has anyone done any comparisons?
Veni Vidi Vici
Compiling your own software is something you don't really want to do on systems at work, especially high-end systems, because it becomes a managability nightmare. Using packaged binaries on modern distros, when possible, means that you know the software doesn't have issues being compiled with that computing environment (gcc and other libraries), may have additional tweaks that are platform-appropriate, and most importantly, is externally verifiable. Being able to do a md5sum on binaries running elsewhere to match, or to verify the entire system and find all the files that have changed from a stock configuration is very useful, and is crippled when you can't expect the same binary as anyone else on the planet.
For every problem, there is at least one solution that is simple, neat, and wrong.
Worryingly the article does not mention *at all* the obvious security questions. If you run a distcc service on a host then who is authorized to connect to it and compile programs? How do they authenticate? What about protection against man-in-the-middle attacks (you may not be paranoid enough to worry about people fiddling with the object code before it is sent back, but at least you ought to know if it's possible). I hope it's not another case of 'ignore security in the service, but it's okay, we'll just put it behind a firewall'.
FWIW, distributed compliation programs like distcc are a good reason to check for buffer overruns and other memory trampling in the compiler. If you've ever managed to segfault gcc by feeding it a bad piece of code, there is a potential exploit via distcc if you can craft a C program that makes the compiler misbehave in the way you want.
-- Ed Avis ed@membled.com
Tuesday, 9:00 am:
Updated, optimized apps! (Profit!!!)
If the custom builds manage to save just a few seconds, it's seconds of MY time, not idle time. Big difference.
Gentoo makes compiling your apps easy and fun!
Besides, there's a warm fuzzy feeling to be gained from actually using the source code for all those open-source applications.
FIXME: Add a sig here
Stuff like this always makes me laugh. You read about it, study it, read the manpages, install the package, then install the source package, recompile from source, and voila, you have a tool that will save you maybe one minute per month, and you've only spent a few hours on it to get it. Also any open source tool will be available with at least 5 independent alternatives, each with 5 different branches because developers went their different ways after disputes. So now you have to test and compare them all to see which one suits you the best.
Xcode build system: Distributing Builds Among Multiple Computers
Yes, Apple has come standard with distcc for quite some time.
“Common sense is not so common.” — Voltaire
Corporate Gadfly
Jonathan Archer: the most beaten up Enterprise captain in Star Trek history
Well, if you fancy lerning XUL &co you can use mozilla for more-or-less anything you like, just look at all the "extensions out there. You couldn't do that so easly with Netscape 3.
thank God the internet isn't a human right.
with your backward views.
What, 12 hours compile time (plus the wait between emerge sync and emerge -u world because you forgot to chain the commands (emerge sync && emerge -u world) ) ?
:-)
12 hours to get "a few seconds saving"?
Fine if you genuinely don't touch your computer between 9pm and 9am, but the rest of us enjoy hacking 24/7
Its amazing how many thoughtless fools knock the time and effort gone into writing opensource software like this. Most of you wouldn't have an operating system if it weren't for timesaving tricks like these. DistCC may not be the perfect solution yet, it certainly has saved me a crapload of time in devlopment. Not to mention my horde of gentoo boxen. Sure i wasted a lot of time setting it up, but at least I had fun.