A Distributed Front-end for GCC

Don't need the same version? by Anonymous Coward · 2002-10-12 03:38 · Score: 1, Interesting

That doesn't make too much sense. What if I had 50% 2.9 machines and 25% 3.2 machines, and a bunch mixed in-between? How would it know which version I wanted my program compiled with?

Re:Don't need the same version? by Angry+White+Guy · 2002-10-12 04:04 · Score: 5, Informative

From the FAQ:

distcc doesn't care. However, in some circumstances, particularly for C++, gcc object files compiled with one version of gcc are not compatible with those compiled by another. This is true even if they are built on the same machine.
It is usually best to make sure that every compiler name maps to a reasonably similar version on every machine. You can either make sure that gcc is the same everywhere, or use a version-qualified compiler name, such as gcc-3.2 or gcc-3.2-x86-linux.

So in other words, keep them close, especially for gcc versions that break backwards capability.

--
You think that I'm crazy, you should see this guy!
Re:Don't need the same version? by j.beyer · 2002-10-12 05:27 · Score: 1

gecc (http://gecc.sf.net) takes care of that and
finds a compile node that uses the same compiler. Think of something like mozilla, that uses C and C++. gcc and c++ could invoke different compiler (at leat different versions).

GCC version by csnydermvpsoft · 2002-10-12 03:38 · Score: 5, Insightful

The machines don't have to be identical or be running the exact same GCC version

Well, to some extent they probably do. If you're running GCC 3.2 on one, you wouldn't be able to run 3.0 on another because of binary incompatibility.

Re:GCC version by Anonymous Coward · 2002-10-12 04:02 · Score: 2, Informative

The binary incompatibility issue only exists for C++, but it is still very important. I took a brief look at the distcc manual, and it did mention (see this section) that you may want to use the same version of g++ on all machines if you are compiling C++. I would change "may" to "must"; you really don't want to take the risk of having incompatibility bugs.

Regardless of the language, if some of the machines are running a different OS, or have a different architecture, they will have to do cross-compilation and things will get more complicated.

Colin Howell
Re:GCC version by Ed+Avis · 2002-10-12 04:37 · Score: 2

If all the machines _are_ identical and have a common filesystem, then it might be a bit quicker to use something simple like Doozer or PVM-enabled GNU make. But if compile time is dominated by just crunching the code then it might not make too much difference.

--
-- Ed Avis ed@membled.com

hmmm by Anonymous Coward · 2002-10-12 03:39 · Score: 5, Funny

Yay! My 133 doesn't have to take 25 billion years to compile anymore! Uhm, wait, I don't have any other computers... Shoot.

Re:hmmm by stevey · 2002-10-12 03:53 · Score: 5, Informative

In that case you might like to look at ccache which is a compiler cache for a single machine.

It will cache the compiler options for each source file and the resultant object file generated. I use a lot when I'm building packages for software - which require multiple compilations. It works very nicely - I'd love to see how well it would integrate with distcc....
Re:hmmm by Blkdeath · 2002-10-12 04:15 · Score: 3, Insightful

In that case you might like to look at ccache
Isn't the default cache size somewhere to the tune of 2-4GB?
I recall that all of my lower powered machines were lucky to see a 6GB drive, letalone have 2-4GB to spare.

--
BD Phone Home!
Shameless plug. Like you weren't expecting it.

Interesting approach by PineGreen · 2002-10-12 03:39 · Score: 5, Interesting

The sun compiler suite comes with dmake, which does the same on the level of make, rather than cc, but is essentially the same.
Definitelly would make beowulf clusters interesting for compilation as well as hard core numerics (no joke intendend).

Re:Interesting approach by csnydermvpsoft · 2002-10-12 03:46 · Score: 3, Insightful

Definitelly would make beowulf clusters interesting for compilation as well as hard core numerics (no joke intendend).

Actually, you wouldn't need a Beowulf cluster at all - just a bunch of networked machines.
Re:Interesting approach by j.beyer · 2002-10-12 05:34 · Score: 1

but than you need all the header and lib files on
all the compilation node. neither distcc nor gecc (http://gecc.sf.net ) need this.
Re:Interesting approach by tqbf · 2002-10-12 07:23 · Score: 2

It's not the same; you need a homogenous build cluster (and the cluster needs to be the same as your build target) to use distributed make.
Re:Interesting approach by Sanga · 2002-10-12 12:33 · Score: 1

distcc did not work with SUN last time I checked. Any updates on that front?

Not N by Nashirak · 2002-10-12 03:44 · Score: 5, Informative

You can almost never achevie a speed up of N. You can acheive S(N) = T(1)/(T(1)*alpha+((1-alpha)*T(1))/N+T0) Where T(1) is the time it takes to run the task with 1 computer, alpha is the part of the task that cannot be parallelised (as in startup registers etc.) and T0 is the communications overhead of the task.

Just to clarify. :)

Re:Not N by Anonymous Coward · 2002-10-12 03:53 · Score: 2, Funny

That sounds like how they calculate my mortgage payments!
Re:Not N by Yokaze · 2002-10-12 03:59 · Score: 1

Can you greet Mr. Amdahl from me? :)

And actually, this is also not quite correct. In praxis, it is possible to achieve super-linear speedup, although unlikely. (N times as much memory is more the culprit than the additional processing power)

While speaking of distcc, one could also mention Group Enabled Cluster Compiler. Never used one of those, I've to admit.

--
"Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"
Re:Not N by Kashif+Shaikh · 2002-10-12 04:04 · Score: 1

Another question why you won't recieve N speed-ups: (but rather compilation problems) is that not all source packages write Makefiles properly so that you can do make -j2(run 2 jobs in parallel).

e.g. it will try running two sub makes within the same depth level at the same time, but the makefile hasn't specified a dependency. for ex:

SRC_DIRS = ha_fsd \
fsd_lib \ ... \

Unless you put: ha_fsd: fsd_lib somewhere, you will run into compilation problems with sub makes and make -j. So my question is how does distcc happen to fix such dependency problems?

AFAIK, samba has written makefiles properly so you can do make -j2, so maybe it was designed for samba(distcc is part of the samba.org domain...go figure).
Re:Not N by wwwojtek · 2002-10-12 04:07 · Score: 1

S(N) = T(1)/(T(1)*alpha+((1-alpha)*T(1))/N+T0)
That's o(N) - the asymptotic speed of N.
Re:Not N by Lionel+Hutts · 2002-10-12 04:56 · Score: 2

Assuming you meant the expression was o(1/N) (time, or a "speed" of o(N)), you are incorrect. In fact, the speedup is not merely O(N) but theta(N) (holding the other factors constant, of course).

God gave you a shift key for a reason. Learn to master it. Or, alternatively, don't use terminology you don't understand.

--
I Can't Believe It's A Law Firm, LLP does not necessarily endorse the contents of this message.
Re:Not N by rbolkey · 2002-10-12 05:10 · Score: 1

Anyone else read the article summary and just know there was going to be a comment saying something about 'n' not being acurate?
Re:Not N by Kashif+Shaikh · 2002-10-12 05:41 · Score: 1

Well smartass,

Try running make -j on source packages, not all kinds of packages will compile(case in point: SGI's Linux FailSafe). The submakefiles assume a library is built and try to reference them via gcc -L../somelib/dipshit.lib. Only thing is that dipshit isn't compiled.

And try reading the gnu make manual about this.
Re:Not N by nivedita · 2002-10-12 06:55 · Score: 1

This is dist*CC*. problems like this will come up only when your source files depend on something else being already compiled, and the Makefile doesn't state this dependency. Exceedingly unlikely, in other words. Linking object files together with libraries is *not* parallelized.
Re:Not N by wwwojtek · 2002-10-13 07:33 · Score: 1

I surely did not mean o(1/N) but just O(N). Kill me for not using the shift, which by the way you are also misusing:
(...) a "speed" of O(N)
The rest I agree with. Including that we both should learn to master a shift key.
Re:Not N by Lionel+Hutts · 2002-10-13 08:21 · Score: 2

You have no idea what you're talking about. o(N) means something, which is why I wrote what you meant that way. It happens to be wrong here.

--
I Can't Believe It's A Law Firm, LLP does not necessarily endorse the contents of this message.
Re:Not N by wwwojtek · 2002-10-13 15:24 · Score: 1

Whatever - don't make a dumb impression that knowing what o(N), O(N), omega(N) or theta(N) mean is a rocket science. You just happened to be a pain in the neck just because I mistyped "O(N)" as "o(N)" - it's your call that you want to troll, now go search for typos elsewhere.
Re:Not N by cakoose · 2002-10-24 15:00 · Score: 1

I'm pretty sure C doesn't have compile-time dependencies in the sense that some files must be compiled before others (at least with everything I've done). Since the interface and implementation are in different files, only the contents of the header file are required to produce object files (and even this isn't even a strict requirement and apparently only warrants a warning). In a language like Java, however, there is such a dependency because of the mix of interface and implementation (I don't mean Java interfaces here, I mean method/variable signatures). I don't know about C++.

While linking, however, there are dependencies because that's when the actual function call addresses are plugged in (Java doesn't even do this till runtime which probably accounts for some of it's startup sluggishness).

Helpful by LinuxInDallas · 2002-10-12 03:45 · Score: 2

Sure, kernel compiles are fairly speedy but large projects still take forever to compile. Even on my AthlonXP 1900+. I downloaded the latest CVS snapshots of KDE in an attempt to de-bluecurve my new RH 8.0 machine and it took a couple hours to compile everything. Depressing.

Re:Helpful by pvera · 2002-10-12 04:55 · Score: 2

I guess I should not complain that it takes almost 2 hours to install the gimp in my iBook 600/384MB (10.2) with fink!

--
Pedro
----
The Insomniac Coder
Re:Helpful by j.beyer · 2002-10-12 05:42 · Score: 1

have you tried to compile KDE with distcc or gecc (http://gecc.sf.net ) ?

Great for OpenOffice by IronTek · 2002-10-12 03:50 · Score: 5, Funny

This could really spur the development of OpenOffice.

With 50, 100 machines or so hooked up, OpenOffice's compile time could be reduced to as little as 1 or 2 days!!!

Re:Great for OpenOffice by rbeattie · 2002-10-12 05:15 · Score: 2

Actually, it would be kind of cool if this was part of the default install in Gentoo - along with some P2P program for finding others online who are running the same app. Then you could download the source code and distributedly (is that a word) compile it... As long as your network is fast enough, you could significantly reduce the amount of time compiling, etc. on slow machines.

Just a thought... maybe I'm off base.

-Russ

--
Me
Re:Great for OpenOffice by Electrum · 2002-10-12 07:37 · Score: 2

Actually, it would be kind of cool if this was part of the default install in Gentoo - along with some P2P program for finding others online who are running the same app. Then you could download the source code and distributedly (is that a word) compile it... As long as your network is fast enough, you could significantly reduce the amount of time compiling, etc. on slow machines.

Wonderful. Then you can get rooted when others running the P2P app have modified the compiler to insert trojans into the generated binaries.
Re:Great for OpenOffice by Screaming+Lunatic · 2002-10-12 07:56 · Score: 2

Yeah, now all we need is a distributed method to launch OpenOffice.
Re:Great for OpenOffice by rbeattie · 2002-10-12 08:34 · Score: 2

Ahh... that would be a problem, wouldn't it.

Doh!

Okay, how about a super-cool, digital-signature based P2P system based on an unbreakable trust matrix? Yeah, right. OK forget it.

-Russ

--
Me
Re:Great for OpenOffice by hany · 2002-10-14 01:48 · Score: 1

Okay, how about a super-cool, digital-signature based P2P system based on an unbreakable trust matrix? Yeah, right. OK forget it.
Better thing to do: Patent it!

--
hany

So now we need to.... by Skal+Tura · 2002-10-12 03:52 · Score: 1

So now we need to make a Distributed coder software...

--
Pulsed Media Seedboxes

YES! N! Re:Not N by angel'o'sphere · 2002-10-12 03:55 · Score: 5, Informative

You can almost never achevie a speed up of N. You can acheive S(N) = (1)/ (T(1)*alpha+((1-alpha)*T(1)) / N+T0) Where
T(1) is the time it takes to run the task with 1 computer, alpha is the part of the task that cannot be parallelised (as in startup registers etc.) and T0 is the communications overhead of the task.

This is the text book. Amdahls law, IIRC.

In reality, and also in most text books, there are exceptions where the solution scales with the number of processes.

And it should be easy enough to see: 5 machines compiling one source file each are 5 times as fast as one machine compiling 5 source files.

As long as you start gcc 5 times in a row you have
the same initialization overhead for EACH instance of gcc one after the other.

If you manage to start gcc with a couple of source files as argument to compile you save the laoding time of the binary at least. That would correspondend roughly to the alpha value.

Amdahls law is usefull for a single program/problem: try to paralelize gcc and you find the compiling source can't get speed up very much. So 5 processors running several threads of one gcc instance, those do not scale by 5.

However it says nothing about just solving the same problem multiple times in parallel.

Regards,
angel'o'sphere

--
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.

Re:YES! N! Re:Not N by joib · 2002-10-12 04:04 · Score: 4, Informative

That assumes you can divide the work equally. Consider that the number of source files probably aren't an integer*N, and that different source files take varying times to compile. Of course, as the number of source files approaches infinity, and if you have some load balancing scheme, this becomes a non-issue. Of course, in Real Life (TM) most projects don't have an infinite amount of source files.
Re:YES! N! Re:Not N by pivo · 2002-10-15 02:29 · Score: 2

Unless you have a very small project with files of vastly different sizes, the size of each file won't matter that much. What will be more problematic is lots of dependencies. For example, if 7 machines are frequently waiting for one machine to compile a file on which their compilation task depends then you'll loose a lot of the benefit of parallel compilation.

Apple Projectbuilder by selderrr · 2002-10-12 03:58 · Score: 3, Insightful

I sincerely hope Apple makes this feature into projectbuilder, which compiles insanely slow when compared to codewarrior. If it wasn't for the superior interface and integration with interface builder, I'd swap back to codewarrior right away.

Does anyone here know how good the speed increase is when compiling on dual G4s versus a single proc ?

--
When will I end this grieving ? When will my future begin ?

Re:Apple Projectbuilder by anarkhos · 2002-10-12 14:09 · Score: 1

superior interface?

Yech!

It has all the interface problems of TextEdit, plus a whole bag more. I mean cripes, it doesn't set the type for the files, it loses track of files if the path changes, its text views don't behave like Mac text views, the menu items ought to be completely reorganized, etc.

What the hell is so good about it?!

--
>80 column hard wrapped e-mail is not a sign of intelligent
>life

So, is it better? by FreeLinux · 2002-10-12 03:58 · Score: 5, Interesting

Is this better than say, Group Compiler?

Differece between distributed/parallel make? by angel'o'sphere · 2002-10-12 03:59 · Score: 4, Interesting

Could someone please point out the difference between a parallel and/or distributed make, like pmake?

It sounds not realy reasonable to put the coding work into gcc when you like to have yacc/bison and a bunch of perl scripts and what ever else you have in your makefile also speeded up.

Regards,
angel'o'sphere

--
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.

Re:Differece between distributed/parallel make? by Ektanoor · 2002-10-12 04:43 · Score: 2

The way the words parallel, distributed are used, is practically the same thing, as far as I could see on the docs of distcc. However the systems being described by the /. may have some fundamental differences in its internals.

It has been a long time since I tried pmake, however I frankly didn't like the thing. It was always segfaulting for some reason, while the cluster did much more complex jobs without an hassle. Unfortunately the cluster lived its life and now I don't know how things are.

It is not reasonable to consider you can go out just with some yacc/bison/perl and well-made Makefile. Here, things are much more complex. The problem is not compiling every file in a parallel machine but the whole code in small pieces. And that's hard. Some things are capable of going parallel, others not. There are several algorithms to determine what may go parallel or not. There are also some general and analythical methods that adapt data to go parallel right from start. Besides every distributed/parallel system needs to exchange information about interval steps, that every process needs to continue its work.

Frankly I don't know too deep the processes going on parallel compilation but I can guess them under my small practice in parallel computers. Imagine some photo or matrix that is divided into small pieces and sent over several machines. Every time calculations touch the edges of a small piece, the system needs data that is located on the other machines. There are a few methods to pass this information, the most popular is MPI - Message Passing Interface. However it is not an universal solution. In cases when data is too heterogeneous, and calculations don't fit a common method, MPI and its cousins, are an hassle to handle with. Compilers is one of these cases as we are always dealing with different files, different compilers and tons of different interactions among data. To create systems capable of doing parallel compilation, we would need other approaches. At least, these systems should give the developer absolute transparency about the fact that compilation is being done in parallel, or else, this system will be completely useless. Imagine that the developer, while creating a new application, is forced to take into account if his code is capable of being compiled in parallel or not. This will be a huge overload to development, if the app goes over the size of Mozilla.

binary incompatibility by crow · 2002-10-12 04:05 · Score: 2

I thought that the binary incompatibility was only a C++ thing. So for some projects, that's an issue, but not for all. Of course, the idea of being careless about which compiler version you're using for building a large project is rather strange.

Big benifit by LoudMusic · 2002-10-12 04:07 · Score: 5, Interesting

I think the biggest plus is that you can have one hella fast machine on your network running distcc that basically does all your compiling for all your other machines. I can see this being a big bonus for server farms like rackspace.com. The customers would be getting compile speeds from a big ass server, rather than just their little dinky Duron.

~LoudMusic

--
No sig for you. YOU GET NO SIG!

I would recommend that you use C by Anonymous Coward · 2002-10-12 04:10 · Score: 1, Funny

Or maybe ASM.
By the way, have you tried Linux?

Perfect for Gentoo by waffle+zero · 2002-10-12 04:11 · Score: 3, Insightful

Whether you're looking to install Gentoo on a old pentium to use as a router or sacrificing your first born to compile KDE, it should make things go quite a bit faster.

Well unless every computer you own runs Gentoo you want to emerge world.

Only true for C++ by FooBarWidget · 2002-10-12 04:11 · Score: 4, Informative

The C ABI between *all* GCC versions (and probably other compilers too) are compatible. You can compile libgnome using GCC 2.95.2 and Nautilus using GCC 3.2 and not have any problems at all.

Re:Only true for C++ by Scott+Wood · 2002-10-12 06:15 · Score: 1

On all *official* GCC versions, perhaps (if you ignore #ifdefs based on the GCC version, for the purpose of using new features or avoiding bugs in certain versions). However, Red Hat's 2.96 abomination is *not* compatible with other GCC versions, even for C. It ignores the "aligned" attribute when laying out structs, so if you have a struct containing types with specified alignment (I'm not sure what it does if you specify the alignment for the datum itself), that struct will have a different layout with 2.96 than with other GCC versions.
Re:Only true for C++ by FooBarWidget · 2002-10-12 06:34 · Score: 2

> However, Red Hat's 2.96 abomination is *not*
> compatible with other GCC versions, even for C.

That's plain wrong. My GNOME 1 libraries are compiled by RedHat's gcc 2.96. All the GNOME 1 apps I have compiled myself are compiled using GCC 3.0, and they work just fine.

Maybe that "align" attribute can cause problems but it's disabled by default, and I have yet to see any desktop app using that flag by default to compile.
Re:Only true for C++ by Scott+Wood · 2002-10-12 09:05 · Score: 1

So perhaps GNOME doesn't use the "aligned" attribute. One example of something working does not mean that everything is compatible. And what exactly do you mean by "disabled by default"? It's not a flag you turn on, but rather a feature that code can use. You don't see it being used unless you look at the code.
Try the following code on 2.96, and compare the results with any other version of GCC (x86 only; the bug doesn't seem to happen on Alpha, and I haven't tried other architectures):
#include <stdio.h> typedef struct { int foo, bar, baz; } __attribute__((aligned(16))) s1; typedef struct { int x; s1 y[2]; } s2; int main(void) { printf("%d\n", sizeof(s2)); return 0; }

Sorry about the formatting, but slashdot doesn't allow <pre> for some reason, and <ecode> strips out indentation.
Re:Only true for C++ by FooBarWidget · 2002-10-12 09:16 · Score: 2

GNOME may be "one" example, but the majority of the apps don't use that feature. In fact, I've never seen code containing __attribute__((aligned(16))).
Re:Only true for C++ by Scott+Wood · 2002-10-12 09:28 · Score: 2, Interesting

Regardless, it is a point of incompatibility. The reason I'm aware of the bug is because of reports I've seen of customers linking our code against code built with 2.96, and experiencing failures as a result. Thus, code exists that is affected by the bug. Whether the "majority" of apps use this feature, or whether the apps you use use it (have you grepped every bit of source code you run? You may find the Linux kernel to be an interesting place to look...), is irrelevant.
And no, the answer is not to remove the alignment, as it speeds up array accesses by making the size of the struct a power of 2.

Could you imagine? by Quill · 2002-10-12 04:11 · Score: 1, Funny

Could you imagine a...

nevermind.

--
My religion forbids the use of sigs.

Unoptimized (or poorly optimized) programs by ajp · 2002-10-12 04:17 · Score: 1

Distributed Front End may be a bit of a misnomer. It appears this is a distributed preprocessor (article says it farms out preprocessed source to different copies of gcc.) Gcc presumably puts the preprocessed source through its own front end (parser), back end (optimizer), and produces binaries which are then linked on the main machine.

What's the problem? Optimizations are probably limited to each separately compiled module. Most optimizations will perform better across a larger code base. (Read the Dragon Book chapter 10 before telling me I'm wrong.) This method may produce valid code optimized by module but the code is nowhere as good as it could be. Making debuggable code is another challenge.

(If the ARE doing multiprocess optimization then I'm duly impressed. I doubt it, though.)

Re:Unoptimized (or poorly optimized) programs by be-fan · 2002-10-12 04:59 · Score: 2

This is true. Unfortunately, traditional makefiles tend to encourage compiling each file seperately, so you have to use workarounds like ICC's .il file mechanism to do global optimizations. However, for developers, this distributed processing is a big boost. When you're working on code, you have to recompile a project repeatedly, and distributing the workload pays off in decreased frustration. For those intermediate builds, optimizations don't really matter anyway.

--
A deep unwavering belief is a sure sign you're missing something...
Re:Unoptimized (or poorly optimized) programs by khuber · 2002-10-12 07:01 · Score: 1

When you're working on code, you have to recompile a project repeatedly,
You should not need to recompile the entire project repeatedly during development, only source which has changed and dependencies on that source. And yes I've seen dumb developers that recompile the entire project unnecessarily.
-Kevin
Re:Unoptimized (or poorly optimized) programs by be-fan · 2002-10-12 07:19 · Score: 2

Of course, I'm assuming the developer is smart enough to have a proper build system set up. But when a program is in active development, you'll often have modified files that can take a significant amount of time to recompile. And god forbid you modify a common header. Even if you add one constant to support one source file, almost all build systems will recompile every single file that includes that header.

--
A deep unwavering belief is a sure sign you're missing something...
Re:Unoptimized (or poorly optimized) programs by Hast · 2002-10-18 10:22 · Score: 2

Unfortuantely GCC is really poor at optimizing in any case, so it's pretty much moot.

Just compare GCC at max optimization and SUN cc at minimum or first level. SUN cc beats GCC even there.

So while it's true that more data -> better optimization is correct I don't think it's a very pressing issue for GCC front ends.

good software design by ucblockhead · 2002-10-12 04:19 · Score: 4, Insightful

If your system is well designed, compiling the entire thing should be a rare event. In a well designed project, most changes occur in c files or in headers only included in a few c files, so most changes only require compiling a very few files.

Compiling the whole source tree should be the sort of thing you do fairly rarely (for a big project), perhaps once a night, perhaps automated so no one has to watch it.

If compile time is something that is a significant problem for you, you really need to look at your code design.

--
The cake is a pie

Re:good software design by The+Ego · 2002-10-12 05:39 · Score: 1

That's BS. Software design won't save you when you are modifying the makefiles, when you are touching a header file that is used everywhere, or when you have to do a clean build just to ensure that you won't screw up everybody simply because make is such a crappy/unreliable build tool. A good deal of the blame is also shared by C/C++ header files, which are a pretty antiquated way to specify an API.

Granted good software design should ensure that you don't have to do this often. However the typical build tools make it way too costly to fix such issues that force big recompiles. As a result they often remain unfixed.

I wish some vendor had the guts to promote a replacement for makefiles that would allow higher-level semantics to be exploited to reliably speed-up builds. Note: Apple is already doing good by promoting jam, even if jam is "just" a better make.

I had great hopes to see such a tool emerge from CodeSourcery's software carpentry contest. Too bad they didn't have the time to get to it.
Re:good software design by Ninja+Programmer · 2002-10-12 17:01 · Score: 1

The original poster is correct when he says a well designed project has very few "recompile from scratch" events. If you think that's BS its because you don't have a well designed project.

1) The first line of my makefile is always:

#if 0

And the last list is always:

#endif

This way I can include the makefile directly into the sources innocuously. Couple this with auto-dependency generation, and its just a matter of noting those files that are dependent on the makefile and those that aren't.

2) If you are touching your header files and this is causing frequent "everything needs to be recompiled" problems, then you have poorly designed header files. Mosts header file activity should deal with interfaces between as few modules as possible. Global header files should mostly contain agreed upon conventions, and absolute high level design APIs that should have been decided from the very beginning, and not change.

Now if only this was integrated with CVS by Magnus+Pym · 2002-10-12 04:21 · Score: 1

One of the most useful things about Clearcase
is its ability to "wink in" object files
from other developers' views. That means, if
one developer in the team has built a version
with a certain time stamp, that particular
object file never has to be built by anyone
else in the group unless some dependency changes. It would kick ass if CVS had that
capability.

Magnus.

Re:Now if only this was integrated with CVS by boots@work · 2002-10-12 23:15 · Score: 1

The fact that ClearCase seems to slow down file operations by a factor of about 2 to 10 makes this argument not exactly compelling. I've seen it take 40 seconds to list a short directory, and on very well-hung machine.

Perhaps there are good things about ClearCase, but speed certainly isn't one of them.

That said, the function you described is basically what ccache does -- only without requiring stupid kernel tricks.
Re:Now if only this was integrated with CVS by doomdog · 2002-10-14 04:38 · Score: 1

ClearCase isn't slow -- it must have been your particular setup. If you're seeing slowdowns by a factor of 10, you have serious problems with your network or ClearCase configuration. I've been using it for a few years now, and I've never seen anything like this.

Even if there are file slowdowns involved, the advanced capabilities provided by ClearCase more than make up for it. I pity those people using CVS on large/complicated projects, just because they (a) don't know any better, or (b) are too cheap to pony up the cash for ClearCase...

Check out also ccache.samba.org by GGarand · 2002-10-12 04:21 · Score: 5, Informative

From the ccache homepage, which is also a Samba hosted project :

ccache is a compiler cache.
It acts as a caching pre-processor to C/C++ compilers, using the -E compiler switch and a hash to detect when a compilation can be satisfied from cache.
This often results in a 5 to 10 times speedup in common compilations.

No, NOT N by HisMother · 2002-10-12 04:21 · Score: 3, Interesting

Ahem. Amdahl's law still operates, and you even say so yourself. There's a constant part that cannot be removed. Let's say it takes 50 msec to initialize gcc and 500 msec to compile the average source file. Then it takes 5.05 sec to compile ten files with one copy of gcc. Ignoring commiunications, it takes 0.550 seconds to compile them on ten machines. Is 5.05/0.550 == 10? No, it's about 9.2. Therefore, the speedup is LESS THAN N. Note that the faster the actual compile time, the lower the speedup would be!

--
Cantankerous old coot since 1957.

Re:No, NOT N by nivedita · 2002-10-12 06:52 · Score: 1

It takes (0.5+0.05)*10=5.5 seconds to compile 10 source files. When did you ever see a project compile 10 source files with a single invocation of gcc?
Re:No, NOT N by angel'o'sphere · 2002-10-12 08:10 · Score: 4, Informative

Let's say it takes 50 msec to initialize gcc and 500 msec to compile the average source file. Then it takes 5.05 sec to compile ten files with one copy of gcc

Your calculation is wrong:

I explicitly said: you start gcc N times for N files.

So a call like gcc: 1.c 2.c 3.c ..... 10.c is not allowed.

Because that call falls under amdahls law(in so far as a common initialization time is needed which is divided amoung the ten compile tasks).

However 10 calls:
gcc 1.c
gcc 2.c
gcc 3.c ...
gcc 10.c

Those ten calls scale with 10! running those ten calls one after the other on 1 machine takes exactly ten times the time then running one of them each on its own machine.

I repeat: Amadahls law is about parallelizing one algorithm. It is not about starting the same algorithm on different problem sets (differnt c files) in parallel.

Where as the first one does not scale infinite and not scale with N, the second one does(of course with some limitations in RL, e.g. if all compilers use the same file server via NFS).

The interesting difference is this: under Amdahls law you have a maximum processors up to which the solution scales. Adding more processors does not make the problem solving faster. Very often it makes the problem solving slower indeed because of communication overhead. OTOH, by just duplicating the hardware and distributing the problem "identicaly" and not "divided and parallelized" you indeed get nearly infinite scale ups. You scale up to the point where the distributing and the gathering gets to expensive. (Distributing C sources from a CVS repository to compile farm machines and gathering the *.o files or better *.so during linking back)

angel'o'sphere

--
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.

ccache by pmineiro · 2002-10-12 04:23 · Score: 1

slightly off topic, but i've found that ccache to be amazing at speeding up compiles when developing code.

it basically hashes (after a cpp pass) and caches. alot of times one has to make clean, tweak a Makefile.am, change a preprocessor variable, or work with multiple different branches, such that most source files are still the exactly the same. in that case, huge speedups.

-- p

Distributed compiling by Zakabog · 2002-10-12 04:23 · Score: 1, Redundant

Wow multiple computers compiling one thing, imagine a beowulf clus.... errr nevermind

Source code by ucblockhead · 2002-10-12 04:25 · Score: 4, Insightful

Running different versions could cause really nasty problems if different versions of gcc support different levels of C (like C99 or older C) or if one version has a compiler bug that another doesn't.

Can you imagine code compiling or failing to compile randomly depending on which machine happens to compile it? Yikes! Debugging nightmare...

--
The cake is a pie

Re:Source code by wik · 2002-10-12 06:15 · Score: 5, Insightful

I've had bad experiences with this in a Condor cluster of linux machines which had different versions of glibc. Seemingly randomly, my jobs would blow up into the netherworld without running and without an error message. Until the administrators matched all of the glibc's (but not the linux distributions, for some reason), I had to compile everything with -static on one machine and pray.

I wonder how much of a problem network bandwidth is in this system. With Condor, moving large datasets between machines is a problem. Object files can be pretty big and if you have a lot of them, you might risk pushing the compile bottleneck to the network. Even worse might be the link step, where all of the objects have to be visible to one machine (gcc doesn't have incremental linking yet, does it?).

--
/ \
\ / ASCII ribbon campaign for peace
x
/ \
Re:Source code by yeOldeSkeptic · 2002-10-12 18:32 · Score: 1

This is an interesting problem. However, my belief is
that the different versions of glibc is not to blame
but rather the different
versions of /usr/include/asm and /usr/include/linux
that are the culprit.

If you can somehow make the /usr/include/asm and /usr/include/linux
the same across the compiling machines you shouldn't have any
problem. Note however that /usr/include/asm and /usr/include/linux should be from versions of the
kernel that were used to compile glibc.
Failure to realize this is often the cause of
many compilation problems.

PS. I am not a Kernel Hacker so please take my observation
with a healthy dose of skepticism.
Re:Source code by wik · 2002-10-13 02:04 · Score: 2

The problem is at runtime, not compilation. Builds were done on one machine. Beyond the compilation step, header files do not matter, but the programs still crashed with gcc problems. When you start a program, it attempts to dynamically link with glibc shared objects. If the runtime linking doesn't work properly, you generally get a segfault.

It turned out that the way I debugged it was to get login permission on the remote machines and run builds on the machines by hand. I later learned about a number of other switches I could have added to my condor submission script that would have logged stdout and stderr.

This actually has nothing to do with the kernel or header files. It's purely a glibc mismatch.

--
/ \
\ / ASCII ribbon campaign for peace
x
/ \

LSF by skroz · 2002-10-12 04:26 · Score: 2

Oh well, I've been trying to get a distributed kernel compilation system working using Platform's LSF. Guess I can throw that project away. ;)

--
-- Minds are like parachutes... they work best when open.

Is this thing really usefull ? by nomenquis · 2002-10-12 04:33 · Score: 2, Interesting

i've recently written some dist compile tool using a different approach after i've been using distcc for half an hour ... the big problem with distcc is that it does all the preprocessing on one machine, which is really an overkill in some situations (and limits the total speed increase one can gain). what i've done is first running a modified version of make and then distribute all the objects which have to be compiled to the machines. everything is done on these machines (including preprocessing) which only limits the number of compile machines by the speed of the network (i've been compiling on 60 hosts, with almost linear speed increase). the only problem involved with this approach is that the same compiler has to be installed on these machines, and that they have to write on some sort of network shared filesystem (for the objects) the same compiler is an easy thing since i've been using the intel compiler version 6 and common system includes (i've put them into a shared include dir) any thoughts about this thing ? (or some folks willing to help me create a version basing on gpl stuff so i could release this one ?)

Re:Is this thing really usefull ? by truth_revealed · 2002-10-12 04:55 · Score: 1

If you have a large cluster of distcc compile computers it is often better to remove the host machine (localhost) from the environment variable DISTCC_HOSTS so distcc does not actually compile files locally, but instead only does preprocessing, remote machine file delegation and final linking. You might also consider reducing the "make -j" number to reduce network traffic if your network is slow.
Re:Is this thing really usefull ? by nomenquis · 2002-10-12 05:04 · Score: 1

i've tried that, but in the project i'm working for the preprocessing takes WAY too long, so my machine can only serve, say, 6 to 8 clients. this is a severe limitation and limits scalability kind regards -ph-
Re:Is this thing really usefull ? by nomenquis · 2002-10-12 05:30 · Score: 1

i did only do the preprocessing on my machine, no compiling, for sure i did however not try ccache, though, imho this wont help much if i've to do the pre processing on one machine, or did i misunderstand something ? another thing to note is that make itself has problems with my project, if i compile with >>40 jobs in parallel make sometimes creates libs before the objects for the libs have been compiled ! and, when using make there is another big disadvantage, make uses way less parallel jobs than specified with -j sometimes (eg when make has to wait for linking a static lib) my approach does not have this problem since i first compile ALL objects and then create the libs / binaries afterwards
Re:Is this thing really usefull ? by nomenquis · 2002-10-12 06:08 · Score: 1

#1 my makefiles are correct, this is a make issue (really, took me quite a while to figure out that its make's fault) #2 i have more than 40 machines available for compilation (and most of them dual or quad and thus its quite common to compile with more than 40 jobs in parallel, but 40 - 60 is the max i can get due to slow network) #3 will have a look at it #4 first figure out all the objects i have to compile, and then compile them on all of the machines but do everything (including pre processing on these machines, the have a shared source / object file system). afterwards do all the linking. using this approach i have almost linear speed increase only limited by network io
Re:Is this thing really usefull ? by nomenquis · 2002-10-12 08:58 · Score: 1

network speed is fine for >40 machines compiling concurrently. and, imho a lot of the headers are cached by the systems. and, make is definitely not working correctly for me when i'm using a lot of jobs. make by itself is not the best tool for compiling with more than say 2 or 4 jobs at the same time. first of all, a make -j4 starts more than 4 jobs at a time under certain conditions, and, it can also happen that only one job is running at a time because make is waiting for it to finish. this decreases performance if you have some objects that take REALLY long to compile (in my setup i've an object which takes as long to compile as all the other objects split onto 40 machines)

Distcc compiled programs have problem with gdb by truth_revealed · 2002-10-12 04:34 · Score: 1

Distcc has a problem with gdb.
It appears when a file is pre-processed on the host machine gcc does not record the directory to the pre-processed output file. When the pre-processed file is farmed out to a remote gcc build machine the remote gcc compiler (not knowing any better) compiles the file and records the remote machine's directory to the object file. Now when a user tries to debug the program gdb cannot find the source directories.
Unfortunately, this "debugging" bug has to be fixed within GCC itself. This thread describes how GCC might be patched to allow gdb to work correctly with distcc, but at this time no action was taken.
This is not a huge problem - distcc is still great for production builds.

Re:Distcc compiled programs have problem with gdb by anno1602 · 2002-10-12 06:06 · Score: 1

This is not a huge problem - distcc is still great for production builds.
Uh, yeah, how often do you do production builds? Every 3 month? You can let that one run over night. It's the time the test builds take that really hurts productivity, and without debugging, they're worthless. So could it be this is a non.solution to an existing problem?
Re:Distcc compiled programs have problem with gdb by truth_revealed · 2002-10-12 06:31 · Score: 2, Insightful

Why so negative? They are aware of the gdb issue and are working to fix it. This does not make distcc 'worthless' by any stretch.
Even if you do have a crash gdb will still give you the stack trace of a distcc-compiled -g program complete with function names and line numbers - you just don't get the source code without setting a directory directive within a gdb session. Big deal. It's perfectly usuable.
Sometimes I'll launch production builds mid-day to correct - wait for it - mid-day production problems. This has to be done as quickly as possible, and distcc is very useful.
The gdb thing is by no means a showstopper.

Re:Java by truth_revealed · 2002-10-12 04:45 · Score: 1

First of all, Java builds are around 100X faster than C or C++ builds, so you probably do not need this technology. I have never seen a Java build take more than 5 minutes from scratch (with thousands of source files); whereas a C++ project built from scratch such as Mozilla might take hours. Java compile jobs are much harder to farm out to remote machines because the java compiler must see either all the related source files and/or class files related to the compilation of each java file. This is due to the fact that Java source is a combination of interface and implementation. C and C++ divide these two aspects: interfaces go in the header files, and the implementation in the .c or .cpp files. So basically you'd have to share the entire java project source tree via Samba or NFS to each remote node which defeats the purpose of this exercise. To speed a Java build - try Jikes. It's around 5 times faster than Javac in my experience.

security? by gooofy · 2002-10-12 04:48 · Score: 5, Informative

looks like this one is not necessarily a good idea to run on a university workstation cluster...

1.4 Security Considerations

distcc should only be used on networks where all machines and all users are trusted.

The distcc daemon, distccd, allows other machines on the network to run arbitrary commands on the volunteer machine. Anyone that can make a connection to the volunteer machine can run essentially any command as the user running distccd.

distcc is suitable for use on a small to medium network of friendly developers. It's certainly not suitable for use on a machine connected to the Internet or a large (e.g. university campus) network without firewalling in place.

inetd or tcpwrappers can be used to impose access control rules, but this should be done with an eye to the possibility of address spoofing.

In summary, the security level is similar to that of old-style network protocols like X11-over-TCP, NFS or RSH.

--
time is a funny concept

Not a new idea, but a noble one anyway by GeckoFood · 2002-10-12 04:48 · Score: 3, Informative

Once upon a time, Symantec had a C++ compiler, and with version 7.5 (1996), the build process could be spread all over a network. This did speed up compilation times as machines that were running the build service that were more or less idle would be sent files to compile, passing back the objects and binaries as oppropriate.

Oh, by the way, that compiler is now called Digital Mars C++.

That said, all the machines on the network had to be running Windows (and at that time, I think only Windows 95 or NT were the only choices available for that compiler). Further, all had to have the same version of the compiler.

For those of us that are running Linux boxes on a primarily Windows network, this system, whether GCC or something else, would be rather hard to implement without a cross-compiler. Additionally, even if all were Linux workstations (or BSD, or Solaris, etc etc etc) wouldn't binary compatibility be driven by not just the version of the compiler but the target OS as well?

It's a noble undertaking. I hope that the developers are putting thought into all the little things like this that will make it tough to pull off.

--
Be excellent to each other. And... PARTY ON, DUDES!

Re:Apple Projectbuilder - gcc issue, not pb by victim · 2002-10-12 05:01 · Score: 2

I haven't used CodeWarrior for years (got a closet full of shirts though), but I believe you are looking at a compiler difference, not the IDE difference.

gcc is not a fast C compiler. It is a portable C compiler and it makes pretty good code, but it is not fast. g++ is a slow C++ compiler.

The Codewarrior compilers were fast compilers that made pretty good code.

Its all in the tradeoffs.

Engineering time
target portability
source language selection
code quality
compilation speed

You can't have it all. Apple has been adding engineering hours to improve speed with gcc, but gcc will always value source language selection and target portability above all else.

Re:So, is it better? - Quick answer from the site. by panoplos · 2002-10-12 05:09 · Score: 3, Informative

From the Group Compiler [sic] (gecc) site:

gecc is a proof of concept. It is heavily inspired by ccache and distcc. You could chain these tools to achieve the same goals gecc tries to reach. Both tools are much more mature and work in production environments. gecc just started with a little different concept. gecc has a central component (distcc has not).

My idea is that gecc could better handle a varying set of compile nodes: if you have some machines that only from time to time could help in distributed compiling than this is nice.

With a central component it might be easier to monitor the compilation and distribute the compile jobs.

Right now gecc is only useful if you read the source.

Emphasis is mine.

I guess it all depends on whether or not you want to work with production quality code or not.

Best of both worlds: distcc with ccache by truth_revealed · 2002-10-12 05:11 · Score: 2, Interesting

using distcc and ccache

from the above link:

distcc & ccache Has anybody yet thought of integrating distcc & ccache? Yes, of course. They work pretty well as separate programs that call each other: you can just write CC='ccache distcc gcc'. (libtool sometimes has trouble with that, but that's a problem that we will fix shortly and it is a separate question.) This is very nearly as efficient as making them both part of a single program. The preprocessor will only run once and the preprocessed source will be passed from ccache straight to distcc. Having them separate allows them to be tested, debugged and released separately. Unless there's a really strong argument to do otherwise, having two smaller programs is better practice, or at least more to my taste, than making one monolithic one.

Live boot CD? by no_such_user · 2002-10-12 05:22 · Score: 5, Insightful

I'd love a speedup, but the time I'd save compiling would be wasted on having to fully install another linux box. Being able to boot a CD with a live linux distro and this software, and then connect to these slave machines to help compile would be immensely helpful. My linux box is a Cyrix 200MHz PC. Being able to stick a CD into my Athlon 1800 to help the compile would be fabulous.

Re:Live boot CD? by j.beyer · 2002-10-12 06:02 · Score: 1

take a look at gecc (http://gecc.sf.net) it is designed to handle a changing set of compilation nodes. It is work in progress, but maybe you would take a look at it. It does not required the same header file nor all the libs installed on alle the compilation nodes, only the compiler (gcc).
Re:Live boot CD? by no_such_user · 2002-10-12 06:18 · Score: 2

Looking at my post, I'm afraid that admitting to running linux on a 200MHz machine, Cyrix no less, makes me sound a little less than serious about my setup. I should point out that this is a thinknic PC with a HDD added on, running sendmail, imapd, apache, sshd, and a few other general linux thingies. I don't run X. It's been running for about 2 years, downed only a handful of times when I felt like updating the kernel. It's small and sits in the corner with no complaints.

Perhaps my biggest complaint is how long it takes to compile. Thus, this project is right up my alley. Good job, folks!

Re:Could this be a new business plan? by Line_Fault · 2002-10-12 05:26 · Score: 1

With the way the business plans mesh, I can see a merger with the underpants nomes coming as soon as next week!

gecc by j.beyer · 2002-10-12 05:31 · Score: 3, Interesting

There is an alternative ( http://gecc.sf.net). gecc has a little different approach, it has a central component that distributes the compilation to a number of compile nodes. The set of compile nodes may change (over time). That is: compile nodes may come and go.

gess is work in progress, distcc is much more mature, but maybe you like to take a look at gecc also.

(yes, gecc is my baby)

But will you notice the difference? by yerricde · 2002-10-12 05:46 · Score: 1

10? No, it's about 9.2. Therefore, the speedup is LESS THAN N. Note that the faster the actual compile time, the lower the speedup would be!

What about a project with

--
Will I retire or break 10K?

Re:What's the point? by j.beyer · 2002-10-12 05:50 · Score: 1

it really depends on your dependencies. Something
like mozilla scale really well, with distcc and with gecc (http://gecc.sf.net/ ).

For smaller project (like 100k source) the gain is smaller.

oops... take 2 by yerricde · 2002-10-12 05:53 · Score: 3, Interesting

Amdahl's law still operates, and you even say so yourself. There's a constant part that cannot be removed. Let's say it takes 50 msec to initialize gcc and 500 msec to compile the average source file. Then it takes 5.05 sec to compile ten files with one copy of gcc.

Then you go on to tell how using ten machines provides only a 9.2-fold speedup. But what about a project with 100 files? It would take 50.05 seconds to build everything on one machine, and it takes 5.050 seconds to build ten files on each machine. Now we have a 50.05/5.050 == 9.92 fold speedup. In practice, can you notice the difference between 9-fold and 10-fold speedup?

Does the speedup factor not approach the number of machines asymptotically?

(How can I "Use the Preview Button!" when an accidental Enter keypress in the Subject invokes the Submit button? Scoop gets it right by setting Preview as the default button.)

--
Will I retire or break 10K?

Re:oops... take 2 by HisMother · 2002-10-12 06:50 · Score: 2

Yes, it does asymptotically approach N. The OP said you couldn't actually get N in practice, and the original reply said "yes you can." If the reply had been "right, but you can get close" I wouldn't have bothered, but the person was implying that Amdahl's law didn't apply, which is nonsense. It applies perfectly well -- you'll never reach precisely N, plain and simple.

--
Cantankerous old coot since 1957.
Re:oops... take 2 by cakoose · 2002-10-24 14:40 · Score: 1

I agree that Amdahl's law is completely true in it's stated context, but there are cases where, because of the increased total memory of a cluster, a clustered parallel implementation will achieve a speedup greater than N.

That said, I don't think that a distributed compile is such a case (though maybe optimization caching could take advantage of the extra memory).

combine it with an object file cache by j.beyer · 2002-10-12 05:55 · Score: 1

you could combine the distribution with object file caching. gecc (http://gecc.sf.net ) shows this. It is work in progress but you could take a look at it if you are interessted.

"No room for parallelism"? Really? by yerricde · 2002-10-12 05:58 · Score: 1

Ever heard of compile time dependencies? Ususally a file depends on other files being compiled first.

But not every single file depends on all the previous files.

so there is no room for paralelism.

Most projects with heavy compile-time dependencies have several stages. I'll take the example of the Allegro library: 1. preparing include files, 2. compiling the .o files, 3. creating the library, 4. compiling and linking the driver programs, 5. make install. All the substeps in each major step can usually be done in parallel because they depend only on previous major steps.

--
Will I retire or break 10K?

distcc is pretty sweet :) by accessdeniednsp · 2002-10-12 06:14 · Score: 2, Interesting

I followed the 30second install overview, and although it took 20minutes to get working, that 20minutes included a distributed kernel compile across 2 of my systems here: a p3-500 and p3-700, 512meg ram each. very nice. i like it!

Another solution: run a gcc cross compiler by truth_revealed · 2002-10-12 06:42 · Score: 1

Run a gcc cross compiler on the remote machine that will make Linux object files. distcc is operating system agnostic. For distcc's purposes you don't even need the target operating system's header files or runtime libraries since the files are already pre-processed on the host machine, and only compilation of .o files takes place remotely.
Here, for example, is how to build a Linux cross compiler hosted on Cygwin.

problems with parallel make by Rubel · 2002-10-12 06:48 · Score: 1

even working on dual-processor machines and using 'make -j4' to allow multiple jobs, the nature of how makefiles calculate dependencies can make it very difficult to get much work done in parallel. other 'make'-like systems, such as

Jam/MR do a better job of evenly building large, multi-directory projects.

Re:Java by khuber · 2002-10-12 06:57 · Score: 1

Maybe your build system is bad. Besides, do you really need to compile the entire thing all the time?

I've seen makefile Java builds that invoke javac for each .java file. That's very inefficient.

Try using ant if you're not already.

-Kevin

Old News. by tqbf · 2002-10-12 07:20 · Score: 2

This was on Sweetcode months ago.

Link Time by wdr1 · 2002-10-12 07:33 · Score: 2

My most common problem these days is *not* compile time, but *link* time. Would be nice if there was a way to speed that up somehow.

-Bill

--
SlashSig Karma: Excellent (mostly affected by moderatio

Re:Link Time by dpt · 2002-10-12 18:02 · Score: 1

Yes. Especially with C++ compilers that don't compile templates until link time.

The HP C++ compiler did this, circa 1995. And it was extremely painful. By default, it wouldn't report to stdout that it was doing more compiling. The link would just take forever for no apparent reason, and our integrators didn't know about this, so they just assumed it had broken somehow ...

Then there's the process problem with this - you would go to link the test executable together, and the *link* would fail with *compile* errors. And so it would turn out you weren't ready for linkage, at all! But you only got to discover this after, literally, hours of the linker grinding away. It wasn't very smart about templates, so it would generate tons of code and compile it *all*!
Re:Link Time by boots@work · 2002-10-13 11:24 · Score: 1

Yeah, linking is now the dominant factor in the Kernel Compile Wars. A 64-way POWER4 machine running 2.5.x can build it in about 4 seconds last time I checked, and most of that is linking.

A paralellizable linker might make a nice Masters project for someone.

What about openmosix? by DrD8m · 2002-10-12 07:49 · Score: 1

This does the same and more:
http://openmosix.sourceforge.net/

Re:What about openmosix? by boots@work · 2002-10-12 23:05 · Score: 1

This is in the FAQ: I suspect OpenMOSIX will not speed up compilation very much, because small bursty tasks, like compilers, probably perform poorly on MOSIX's process-migration architecture. As I understand it, it should work much better for interactive use or scientific computing where the tasks can usefully stay on each machine for a long time.

But if you want to post some comaritive performance measurements I'm sure everybody would find it interesting.

Anyone get this working for OSX? by Zyzebo · 2002-10-12 07:53 · Score: 1

Like one of the posters below, I'm pretty annoyed that there isn't much support for faster compile options in Project Builder yet (even parallel compiles are pretty buggy). We tried using an earlier version of distcc on OSX without much luck but it's been a couple of months so we'll have to try it again. Certainly after having the luxury of Incredibuild for our PC compiles, we're looking forward to a Mac solution for distributed compiles.

The Irony.... by OakLEE · 2002-10-12 08:35 · Score: 1

How ironic, I got an ad for Visual Studio .Net when I opened up this page.

--
The sun beams down on a brand new day, No more welfare tax to pay, Unsightly slums gone up in flashing light...

Re:Apple Projectbuilder - gcc issue, not pb by selderrr · 2002-10-12 09:16 · Score: 2

I agree that gcc has different goals, but comeon, the speed difference is really flabbergasting ! codewarrior is twice as fast ! I can understand a speed loss due to portability, but that much ??

Anyways, it doesn't all matter that much if there were to be a speed gain from dual procs or distributed compiling. So to restate the Q : do you have any idea if PB is faster on a dual1GHz than on a signle 1GHz ?

--
When will I end this grieving ? When will my future begin ?

One more alternative by CreamsicleSeventeen · 2002-10-12 09:19 · Score: 1

There's also compilercache . The homepage details some of the benefits of of using this to supplement make when, for example, changing compiler flags.

Headers by ucblockhead · 2002-10-12 13:13 · Score: 2

If you have headers included everywhere that have to be changed often, you've got a bad design.

You'd be amazed at how often it is easy to get rid of those stupid "include everywhere" headers.

--
The cake is a pie

got a distcc windows/linux cross compiler working by truth_revealed · 2002-10-12 15:44 · Score: 1

It took me several hours to correctly build a gcc-3.2 Windows (cygwin) hosted Linux targetted compiler (see instructions in parent) and to build distcc on cygwin - but the thing actually works. I can launch a `make' on linux and distcc actually performs the compiles via cross compiler on Windows.

The Windows box is several times faster than the Linux box in the test. I have distccd (the distcc daemon) running on the Windows box. The linux cross compiler on the Windows box is named 'i686-pc-linux-gnu-g++' so it is not confused with my native Windows Cygwin compiler.

Timing for a completely local build on Linux:

(linux) $ time make CCC=g++
g++ -c FileA.cpp
g++ -c FileB.cpp
g++ -o Program FileA.o FileB.o
7.40user 0.25system 0:07.66elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (3097major+7464minor)pagefaults 0swaps

And now a timing with distcc farming compiles to the remote Windows box:

(linux) $ export DISTCC_HOSTS='windowsbox'
(linux) $ time make -j2 CCC='distcc i686-pc-linux-gnu-g++'
distcc i686-pc-linux-gnu-g++ -c FileA.cpp
distcc i686-pc-linux-gnu-g++ -c FileB.cpp
distcc i686-pc-linux-gnu-g++ -o Program FileA.o FileB.o
1.09user 0.39system 0:03.44elapsed 43%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (2059major+1207minor)pagefaults 0swaps

I know my test file sizes are a joke, but it shows a distcc distributed build time of 3.44 seconds versus a regular (non-distributed) build time of 7.66 seconds.
Twice as fast as a local build.

So put those idle Windows to work!

Good for boot strapping too by CustomDesigned · 2002-10-12 15:52 · Score: 2, Interesting

I did a simpler version of this many years ago. We had a 68K unix system from Motorola. The pcc compiler was really really bad, so we installed gcc. (I've heard a lot of complaints about how gcc optimizer is no good, but it beat the pants off of pcc.)

Later, we upgraded to Motorola 88k. The new system came with a GreenHills compiler - which was so buggy it was unusable. The bugs were acknowleged, but never fixed. It was too buggy to compile gcc - I wasted many days simplifying expressions by hand to no avail.

My solution was to build a cross compiler on the 68k, and run it with a stub on the 88k for cc1 that fed the preprocessed source to the compiler on the 68k, and got back the assembler source. (There was no 88k support yet in gcc, so I had to write my own machine model. It was incompatible with Greenhills in passing and returning structs.) The preprocessor, assembler, and linker would run on the 88k - only the actual compiler pass ran on the 68k. This worked beautifully to build gcc on the 88k! And our 33Mhz 88k was so much faster, that I built a 68k cross compiler for the 88k and speeded up compiles on the 68k a great deal.

Next we upgraded to Power PC. AIX comes with no compiler at all! (IBM's compiler is very good - but expensive.) Fortunately, it comes with an assembler and linker. By dint of copying headers from AIX, and hand preprocessing and compiling to assembler, I got the preprocessor running on AIX. Then it was simple to run the cc1 stub again to compile up a native gcc for AIX. The new PowerPC was so much faster than the old 88k, that a cross compiler to speed up 88k compiles was in order also. (I contributed AIX shared lib support for gcc.)

Recently, I fired up a PowerPC cross compiler on our 600Mhz PII running Linux to speed up compiles on our old 100Mhz PowerPC AIX using the same simple technique. By using the -pipe option to gcc, the compiler on Linux runs in parallel with the preprocessor and assembler on AIX - truly efficient.

In conclusion, I want to thank Richard Stallman and the FSF for making it possible to rise above the stupid problems caused by closed source. Before the 68k, we had SCO Xenix. Basic utilities like tar and cpio were broken, reported, and never fixed - but GNU software was there with solid and reliable replacements. (Yes, I made donations and even ordered a tape.)

I wrote the same thing for MSVC++ by The+Panther! · 2002-10-12 18:25 · Score: 2

I spent a couple of weeks tinkering with the idea a while back and wrote a server script in Perl that can run on any Win32 box (no compilers or anything necessary to be installed), and wrote a client script that plugs into MSVC++'s IDE. The client parses the build commands and contacts all servers through a UDP broadcast, then connects to each server, preprocesses a file at a time, transfers the processed C/C++ file and the compiler and its related binaries to the server, then sends the build commands to the server as well. All output is captured on the server and sent back to the client.

It worked pretty well, except I had a lot of problems getting fork to work right in Perl on my Win32 box without crashing, so I could get good parallelism. It even fell back to using the local machine in the event of a failure remotely at any point, so on a multiprocessor development machine and no servers to connect to, it would actually use all the processors to compile--something MSVC doesn't do normally.

The nice thing about the server script is it only has three functions: accept a file, send a requested file, and execute an arbitrary command. So it doesn't really care about what it's doing. At work, we're planning to use it to leech some cycles off the receptionist machines and managers' boxen (we all know they don't do anything all day but email anyway!).

The real limiting factor is that preprocessing is relatively slow due to seeks on the hard drive, where not all the headers fit in the disk cache at once. This is a bigger bottleneck than you'd believe.

For what it's worth, I've learned of another company that concatenated all their .cpp files together and only compiled that one file every time, so their build was a fixed (short) cost, and never hit any header twice. Dirty code that has module-local statics would choke on that technique, but for good code, it's prolly smarter than distributing it.

--
Any connection between your reality and mine is purely coincidental.

Re:got a distcc windows/linux cross compiler worki by truth_revealed · 2002-10-12 18:35 · Score: 1

I just built something much larger: the Parrot (perl6) VM:
266 MHz Linux box without distcc: 6 minutes, 45 seconds.
266 MHz Linux box + distcc + 2.0 GHz Windows box (with linux cross compiler): 1 minute, 52 seconds.
Who says Windows boxes are good for nothing? ;-)

A simple patch by ironfroggy · 2002-10-12 19:53 · Score: 1

Would it not be simple to just patch make for options to run multiple commands in parellel, which would be automatically sped up by clusters?

"Oh, I have to build a dozen other targets for this target and none of them depend on each other. I'll run them all together!"

It could even be faster on a single system, depending on certain variables. Ah, well, I'll get to work on it later this week.

--
Question
http://www.ironfroggy.com/

Re:A simple patch by boots@work · 2002-10-12 23:08 · Score: 1

GNU Make already does that. RTFM.

Network Filesystem by laptop006 · 2002-10-12 23:11 · Score: 1

If you're compiling over the net and can't live with NFS (or samba) look at AFS, it's nicely secure, and is designed to work over the Internet.

--
/* FUCK - The F-word is here so that you can grep for it */

Distributed frontend for GNU Make by jensthi · 2002-10-13 07:24 · Score: 1

A distributed front end for compilation/gcc is nice. However, this idea has been implemented before, and in a more general fashion.

PPMAKE work as a distributed front end for GNU make. This works brilliantly (I used it as a part of my MSc thesis), and distributes arbitrary compilation work in arbitrary languages to a preconfigured list of hosts.

I remember using a cluster of approximately 30 hosts, this worked good. But required a fast master.

This is available as a package for at least Suse.

Requires PVM to be installed, though. But this is usually no problem. :-)

Re:Distributed frontend for GNU Make by boots@work · 2002-10-13 13:25 · Score: 1

Yes, something like this is more general. distcc is only useful for compiling C and C++.

However the drawback is that it requires you to use a shared filesystem and to have the same compilers and headers installed on all machines. If you already have that situation -- /home NFS-mounted, and a single software image -- then it's great.

However many systems are not so tightly controlled. distcc is much less intrusive to install -- you can distribute jobs to a machine administered by somebody else without requiring them to start using NFS or stop upgrading libraries.
Re:Distributed frontend for GNU Make by cant_get_a_good_nick · 2002-10-24 05:46 · Score: 2

gmake has parallel makes. See the -j option.

Re:Only true for C++ (and some C corner cases) by aoliva · 2002-10-13 11:04 · Score: 1

Not entirely true. Some corner cases regarding alignment of bitfields or so have changed between 3.0 and 3.2. Most people don't run into such problems, but the kernel does.

Slashdot Mirror

A Distributed Front-end for GCC

131 of 195 comments (clear)