Slashdot Mirror


Optimizing distcc

IceFox writes "Having fallen in love with distcc and its ability to speed up compiling (insert anyone who compiles like Gentoo users or Linux developers). I recently got the chance to dive deeper into distcc. By itself distcc will decrease your build times, but did you know that if you tweak a few things you can get a whole lot better compile times? Through a lot of trial and error, tips from others, profiling, testing and just playing around with distcc, I have put together a nice big article. It shows how developers can get a bigger bang for their buck out of their old computers and distcc with just a few changes."

201 comments

  1. strlen by Anonymous Coward · · Score: 5, Funny

    Yep, root of all evil. strlen. Fix strings and you'll fix everything.

    1. Re:strlen by Swamii · · Score: 1

      Here here. If only they would've known that from the beginning, bankruptcy could've been avoided altogether.

      --
      Tech, life, family, faith: Give me a visit
    2. Re:strlen by Anonymous Coward · · Score: 1, Informative
    3. Re:strlen by Trolling4Dollars · · Score: 1

      What about "here hare"? (Note: You have to be a "Withnail and I Fan" to get this)

    4. Re:strlen by Anonymous Coward · · Score: 1, Funny

      Unfortunately, you are the only one. This makes your joke not funny.

    5. Re:strlen by Anonymous Coward · · Score: 0

      If a joke is told, and no one gets it, was it really a joke?
      If people thousands of years later read a transcript, and get a stillborn joke, can it suddenly become funny?
      Can we distinguish between potential and kinetic joke energy this way?
      I worry about such.

  2. Wow... by JoeLinux · · Score: 5, Funny

    For some reason, "Imagine a beowulf clusters using this" is on-topic.

    This is so weird.

    I must drink now.

    "I do NOT suffer from a mental condition. I'm enjoying every second of it."

    1. Re:Wow... by supabeast! · · Score: 2, Informative

      Actually, that's sort of off-topic because distcc negates the need to set up a beowulf cluster.

  3. Website bit slow... by neonstz · · Score: 5, Funny

    ...maybe you should work on disthttpd next?

    1. Re:Website bit slow... by Wolfier · · Score: 2, Insightful

      It sort of already exists...the name is "Bit Torrent"...

    2. Re:Website bit slow... by Anonymous Coward · · Score: 0, Funny

      And MAYBE you should work on dipshitpd? [ZING!]

    3. Re:Website bit slow... by Cyberop5 · · Score: 1

      a better example would be freenet. Its a distributed web where pages are ranked by popularity. A slashdot, in theory, would cause more clients to replicate the site... an automirroring tool. I don't think it works quite the same way as bitTorrent.

      --
      Urgo: "I want to live. I want to experience the universe and I want to eat pie!"
      Jack: "Who doesn't??"
    4. Re:Website bit slow... by OneEyedApe · · Score: 1

      Bittorrent is more a distftp (with a few added features).

      --
      Life sucks, but death doesn't put out at all....
      --Thomas J. Kopp
  4. Nice big article by wildzeke · · Score: 5, Funny

    By the time I read the article, my kdelibs was compiled.

  5. /.-ed already? by Lord+of+Ironhand · · Score: 4, Funny

    Looks like that server won't be doing much compiling soon...

    1. Re:/.-ed already? by Anonymous Coward · · Score: 0, Offtopic

      Just a general question here.
      has slashdot ever been /.ed?
      is that even possible?

      (how much do you wanna bet that this post will be classed "offtopic"

    2. Re:/.-ed already? by Thud457 · · Score: 2, Informative

      A: 9/11/2001.

      --

      the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff

    3. Re:/.-ed already? by Cheeziologist · · Score: 1

      even the mighty computer science house here at rit can't take the brutal onslaught that is a slashdotting. I can never look up at their floor with the same longing to be a memeber ever again.

    4. Re:/.-ed already? by Bombcar · · Score: 2, Funny
      The site now says:


      What once was here
      is here no more;
      Our slashdotted luser
      has been shown the door.
    5. Re:/.-ed already? by voidptr · · Score: 1

      It's not so much it couldn't handle it as the one of the sysadmins with a stick up his ass wanted to make a point about putting inocent machines on the front page of slashdot without warning the admins first.

      --
      This .sig for unofficial government use only. Official use subject to $500 fine.
  6. Blue by Anonymous Coward · · Score: 0

    is it just me or is this page blue?

  7. anal retentive admin by maxbang · · Score: 3, Funny

    From the article:

    I even found different colored cable for the different areas of my cube.

    I wonder if he also sealed the empty packaging, waste paper, and dead hardware in neat little foil packets before disposing of them in the proper receptacle, which, of course, sits right next to the cozy for his server. ;)

    --
    I also reply below your current threshold.
    1. Re:anal retentive admin by kidlinux · · Score: 1

      Nevermind having 15 computers in a small space. I have 3 and I'd use colour coded cables and lables.

      Big messes of cables and wires are a real pain in the ass.

      --
      -kidlinux.
  8. Reliefe for the /. site by Anonymous Coward · · Score: 2, Informative

    distcc optimizations - March 30th 2004
    and how to compile kdelibs from scratch in six minutes

    If you don't already know about distcc I recommend that you check it out. Distcc is a tool that sits between make and gcc sending compile jobs to other computers when free, thus distributing compiles and dramatically decreasing build times. Best of all it is very easy to set up.

    This, of course, leads to the fantastic idea that anyone can create their own little cluster or farm (as it is often referred to) out of their extra old computers that they have sitting about.

    Before getting started: In conjunction with distcc there is another tool called ccache, which is a caching pre-processor to C/C++ compilers, that I wont be discussing here. For all of the tests it was turned off to properly determine distcc's performance, but developers should also know about this tool and using it in conjunction for the best results and shortest compile times. There is a link to the homepage at the end of this article.
    Farm Groundwork and Setup

    As is the normal circle of life for computers in a corporate environment, I was recently lucky enough to go through a whole stack of computers before they were recycled. From the initial lot of forty or so computers I ended up with twelve desktop computers that ranged from 500MHz to 866MHz. The main limit for my choosing dealt with the fact that I only had room in my cube for fifteen computers. With that in mind I chose the computers with the best CPU's. Much of the ram was evened out so that almost all of the final twelve have 256MB. Fast computers with bad components had the bad parts swapped out for good components from the slower machines. Each computer was setup to boot from the CD-ROM and not output errors when booting if there wasn't a keyboard/mouse/monitor. They were also set to turn on when connected to power.

    Having enough network administration experience to know better, I labeled all of the computers, the power cord and network cord that was attached to them. I even found different colored cable for the different areas of my cube. The first label specified the CPU speed and ram size so later when I was given faster computers, finding the slowest machine would be easy. The second label on each machine was the name of the machine, which was one of the many female characters from Shakespears plays. On the server side a dhcp server was set up to match each computer with their name and IP for easy diagnosis of problems down the line.

    For the operating system I used distccKNOPPIX. distccKNOPPIX is a very small Linux distribution that is 40MB in size and resides on a CD. It does little more then boot, gets the machine on line and then starts off the distcc demon. Because it didn't use the hard disk at all, preparation of the computers required little more than testing to make sure that they all booted off the CD and could get an IP.

    Initially, all twelve computers (plus the build master) were plugged into a hub and switch that I had borrowed from a friend. The build master is a 2.7Ghz Linux box with two network cards. The first network card pointed to the Internet and the second card pointed to the build network. This was done to reduce the network latency as much as possible by removing other network traffic. More on this later though.

    A note on power and noise, the computers all have on-board components. Any unnecessary pci cards that were found in the machines were removed. Because nothing is installed on the hard disks they were set to spin down shortly after the machines are turned on. (I debated just unplugging the hard disk, but wanted to leave the option for installation open for later.) After booting up and after the first compile when gcc is read off the CD the CD-ROM also spins down. With no extra components, no spinning CD-ROM or hard disk drives the noise and heat level in my cube really didn't change any that I could notice (there were of course jokes galore by everyone about saunas and jet planes when I was setting up

    1. Re:Reliefe for the /. site by Homology · · Score: 1
      The main limit for my choosing dealt with the fact that I only had room in my cube for fifteen computers.

      I guess he don't mind a lot of noise...

    2. Re:Reliefe for the /. site by IceFox · · Score: 2, Informative

      Hmm didn't finish reading the article did you (that was in the parent poster!)? If you had you would see that in fact the noise level didn't rise in my cube. :D -Benjamin Meyer

      --
      Do you changes clothes while making the "chee-chee-cha-cha-choh" transformation sound?
    3. Re:Reliefe for the /. site by Homology · · Score: 1
      Nope, I skimmed it since this is Slashdot after all :D

      Putting 12 older PC in the cubicle and have same level of noise could mean that either you put some work into making them quiet, or it's quite noisy already :D

    4. Re:Reliefe for the /. site by gl4ss · · Score: 1

      ..or that they were quite silent to begin with.

      there's quite a few of the pentium2 and pentium3 era pc's that only had one fan in the whole system(some compaqs at least).

      --
      world was created 5 seconds before this post as it is.
  9. Servers last words by DR+SoB · · Score: 1, Funny

    After being posted on /.

    "Dieing Ben-ja-min" - Short Circuit 2

    --
    Mod +5 Drunk
  10. ccache by Lord+of+Ironhand · · Score: 4, Interesting

    ccache is also nice for optimizing compiling. He probably mentioned it in the article, but since it seems /.-ed I wouldn't know... and by the time you've got both distcc and ccache running the article might be available again so you can read if you did it the right way :-)

    1. Re:ccache by aridhol · · Score: 2, Informative

      Yeah, he says that ccache would speed up the compilation, but he specifically disabled it so it wouldn't interfere with his timings (later runs would appear more efficient than they should be).

      --
      I can't say that I don't give a fuck. I've just run out of fuck to give.
  11. Unfortunately by Anonymous Coward · · Score: 0

    distHTTP isn't quite there yet.

  12. Copy of my article... by IceFox · · Score: 4, Redundant
    poor web server... I thought it could handle it...

    distcc optimizations - March 30th 2004

    and how to compile kdelibs from scratch in six minutes

    If you don't already know about distcc I recommend that you check it out. Distcc is a tool that sits between make and gcc sending compile jobs to other computers when free, thus distributing compiles and dramatically decreasing build times. Best of all it is very easy to set up.

    This, of course, leads to the fantastic idea that anyone can create their own little cluster or farm (as it is often referred to) out of their extra old computers that they have sitting about.

    Before getting started: In conjunction with distcc there is another tool called ccache, which is a caching pre-processor to C/C++ compilers, that I wont be discussing here. For all of the tests it was turned off to properly determine distcc's performance, but developers should also know about this tool and using it in conjunction for the best results and shortest compile times. There is a link to the homepage at the end of this article.

    Farm Groundwork and Setup

    As is the normal circle of life for computers in a corporate environment, I was recently lucky enough to go through a whole stack of computers before they were recycled. From the initial lot of forty or so computers I ended up with twelve desktop computers that ranged from 500MHz to 866MHz. The main limit for my choosing dealt with the fact that I only had room in my cube for fifteen computers. With that in mind I chose the computers with the best CPU's. Much of the ram was evened out so that almost all of the final twelve have 256MB. Fast computers with bad components had the bad parts swapped out for good components from the slower machines. Each computer was setup to boot from the CD-ROM and not output errors when booting if there wasn't a keyboard/mouse/monitor. They were also set to turn on when connected to power.

    Having enough network administration experience to know better, I labeled all of the computers, the power cord and network cord that was attached to them. I even found different colored cable for the different areas of my cube. The first label specified the CPU speed and ram size so later when I was given faster computers, finding the slowest machine would be easy. The second label on each machine was the name of the machine, which was one of the many female characters from Shakespears plays. On the server side a dhcp server was set up to match each computer with their name and IP for easy diagnosis of problems down the line.

    For the operating system I used distccKNOPPIX. distccKNOPPIX is a very small Linux distribution that is 40MB in size and resides on a CD. It does little more then boot, gets the machine on line and then starts off the distcc demon. Because it didn't use the hard disk at all, preparation of the computers required little more than testing to make sure that they all booted off the CD and could get an IP.

    Initially, all twelve computers (plus the build master) were plugged into a hub and switch that I had borrowed from a friend. The build master is a 2.7Ghz Linux box with two network cards. The first network card pointed to the Internet and the second card pointed to the build network. This was done to reduce the network latency as much as possible by removing other network traffic. More on this later though.

    A note on power and noise, the computers all have on-board components. Any unnecessary pci cards that were found in the machines were removed. Because nothing is installed on the hard disks they were set to spin down shortly after the machines are turned on. (I debated just unplugging the hard disk, but wanted to leave the option for installation open for later.) After booting up and after the first compile when gcc is read off the CD the CD-ROM also spins down. With no extra components, no spinning CD-ROM or hard disk drives the noise and heat level in my cube really didn't change any that I c

    --
    Do you changes clothes while making the "chee-chee-cha-cha-choh" transformation sound?
    1. Re:Copy of my article... by xSquaredAdmin · · Score: 0, Troll

      poor web server... I thought it could handle it...

      Fool! Nothing can stand before the might of /.!

      Well, at least not anything you can get your hands on.

      --
      Crushing dreams at the speed of sarcasm
    2. Re:Copy of my article... by Anonymous Coward · · Score: 0
      (Score:0, Redundant)

      HAH, EAT IT, KARMA WHORE!


      ;-) look at the UID, folks...

    3. Re:Copy of my article... by Lord+of+Ironhand · · Score: 1, Insightful
      Come on mods, he's the author of the article! At least give him some points for that!

      @IceFox: thanks. Great article.

    4. Re:Copy of my article... by Anonymous Coward · · Score: 0

      Stupid shit. You should have told someone you were gonna post to slashdot.

    5. Re:Copy of my article... by Anonymous Coward · · Score: 0
      poor web server... I thought it could handle it...

      Ahahaha!!!

      *wipes tears*

      Come on, do another!

    6. Re:Copy of my article... by Anonymous Coward · · Score: 0

      Score: 5, Redundant.
      Only on slashdot, where redundency is an art.

    7. Re:Copy of my article... by Anonymous Coward · · Score: 0

      He already gets mods for having the story accepted.

    8. Re:Copy of my article... by Anonymous Coward · · Score: 0

      Egads! 5, Redundant!

      40% Redundant
      40% Informative
      20% Underrated

  13. Re:Article Text (Slashdotted Server) by Anonymous Coward · · Score: 0

    It's redundant when it has already been posted.... Of course, both comments have the same timestamp, so it was a tossup as to which would win....

  14. Distccd for cygwin by aberant · · Score: 5, Informative

    My life changed the day i found out i could get my super fast P4 Windows XP box to compile for my slow linux box. Distcc for cygwin is a miracle. check out the thread at Gentoo forums

    1. Re:Distccd for cygwin by bee-yotch · · Score: 2

      This is probably the most productive thing I've used windows for in the past two years ;-).

    2. Re:Distccd for cygwin by Technonotice_Dom · · Score: 1

      Or use distccKNOPPIX and no need to boot into the horrid, unreliable beast at all :)

    3. Re:Distccd for cygwin by aberant · · Score: 1

      some people have audio software that will run in nothing other then windows xp...

    4. Re:Distccd for cygwin by Lord+Ender · · Score: 1

      I find it offensive that you would run Windows on your fastest home machine. Linux should always have the best hardware. You Sir, insult me. /joke

      --
      A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
    5. Re:Distccd for cygwin by Anonymous Coward · · Score: 0

      Hmm, sounds like they've got a problem.

    6. Re:Distccd for cygwin by notanatheist · · Score: 1

      /joke? Really now, Windows is actually run most often on my second fastest box. Box 1 & 3 have a dedicated fiber connection with a overclocked 3.5Ghz HT P4 in one and XP2200 in the other. Keeps me going happy. Windows is quite undeserving of my best hardware. Want to know more? eviltechmonkey.com/mymach.html

    7. Re:Distccd for cygwin by Andy+Dodd · · Score: 1

      It's a PAIN to install and get running, but DAMN is it worth it!

      A similar technique to the distcc + cygwin install can be used to allow a distcc host to provide a GCC version other than its system GCC version. For example, my setup:
      1.7 GHz P4-M (Gentoo box, always is the controlling node)
      1.1 GHz Athlon (RedHat 7.3, sys GCC is 2.96, but I have a 3.3 tree in another location that won't interfere with the 2.96 tree)
      WinXP box with an Athlon XP 1?00+

      The XP box has 256M RAM, the other two 512M. Works great.

      --
      retrorocket.o not found, launch anyway?
  15. Hmmmm by Anonymous Coward · · Score: 0

    I see that distCommentsAboutSlashdotting is working perfectly.

  16. sounds like its from KIDS (the movie) by Anonymous Coward · · Score: 0

    Telly: Do you have any distcc?

    shopkeeper: distcc?

    You know how the rest goes...

  17. I wonder... by jhouserizer · · Score: 1, Funny

    I wonder if all the time he'll save in his compiling will add up to the amount of time he spent figuring out how to speed it up + the time spent writing this article?

    1. Re:I wonder... by timeOday · · Score: 4, Insightful

      If he was only interested in helping himself he wouldn't have bothered with a nice writeup for all us to read.

    2. Re:I wonder... by Lord+of+Ironhand · · Score: 4, Insightful

      If everyone measured the value of his actions only by the time it will save him/herself, there probably wouldn't be much of a free software community these days.

    3. Re:I wonder... by Anonymous Coward · · Score: 0

      Did you read the article? Yes, he compiles kdelibs.

    4. Re:I wonder... by jhouserizer · · Score: 1

      Gee wiz... it was a joke!

      A lame joke, but a joke none-the-less.

      I fully understand that he is doing this to benefit more than only the amount of his own spare time.... I spend several hours a week on open source contributions myself.

      I just thought that it was funny that he mentioned how much time he spent coming up with a scheme to save time...!

  18. Martin Pool interview by Wise+Dragon · · Score: 5, Informative

    Martin Pool, the brains behind distcc, was interviewed by ZDNet yesterday. How timely.

    http://web.zdnet.com.au/builder/program/work/sto ry /0,2000034960,20283318-1,00.htm

    1. Re:Martin Pool interview by Wise+Dragon · · Score: 2, Informative
  19. Mirror by Rufus211 · · Score: 4, Informative

    I feel like burning my new site in a bit =)

    http://hackish.org/~rufus/distcc.php.html

    1. Re:Mirror by revmoo · · Score: 1

      I notice a connection refused. Would you mind sharing what the bandwidth usage was?

      I've wanted to mirror files for /. lately, but I don't want to swamp my server.

      --
      I would expect such blatant racism on Fark, but on Slashdot? Mods please ban this asshole.
    2. Re:Mirror by Rufus211 · · Score: 1

      Bandwith was about, um, 1mb? I took the server down as an unrelated event to get software raid working.

  20. Re:Article Text (Slashdotted Server) by Lord+of+Ironhand · · Score: 0, Redundant

    It was rightfully modded redundant because the article had already been posted just seconds earlier. As well as by the author himself just a few posts down (we should at least give him some mod points for the interesting article...)

  21. Gentoo Impact(s) by ViceClown · · Score: 1

    This was a great read... which I was fortunate enough to do before this poor guy's machine got /.ed. Anyway, an adaption of this article aimed at specific users or tasks (developers, Gentoo users, etc) would be awesome! Kudos for the writeup. Can't wait to go home and try it out!

    --
    Have a Happy.
    1. Re:Gentoo Impact(s) by y2dt · · Score: 5, Informative

      official gentoo distcc guide:
      http://www.gentoo.org/doc/en/distcc.xml

    2. Re:Gentoo Impact(s) by ViceClown · · Score: 1

      Ah ha!

      --
      Have a Happy.
    3. Re:Gentoo Impact(s) by Anonymous Coward · · Score: 1, Funny
      Un-official Gentoo Slasdot Comment FAQ:

      Q: I'm considering posting a comment to slashdot about Gentoo. When is it approriate to introduce Gentoo into a discussion?
      A: It's always appropriate to introduce Gentoo into any conversation. As far as Gentoo is concerned, there is no such thing as "offtopic".

      Q: I'm concerned that other slashbotters might tire of my Gentoo blathering. Should I curtail the number of Gentoo related posts I make?
      A: Of course not! Everyone loves to hear about Gentoo! No on tires of mythical anecdotes of Gentoo's speed and power!

      Q: Someone told me that Gentoo is a copy of FreeBSD. I also heard BSD is dying. I don't know what BSD is, but should I be concerned that my precious Gentoo might be dying too?
      A: Rest easy, young Gentoophile. While it is true that BSD is dying, Gentoo is not a copy of BSD. In fact, Gentoo is the one true operating system and predates all known variants of BSD, Linux, and AmigaOS. It is rumored that Jesus wrote the first version Gentoo in a Passionate hacking session circa 40 AD.

      Q: I've been using Gentoo/Linucks for over 2 months, so I consider my self fairly l44t as a hacker. I've only been using these compile flags in my emerging scripts: -0 -g -march=286. Can you recommend some more flag for even fasters gentoo compiles?
      A: Those are a good start, you might try these: -Oyeah -enable-gentoo-registers -funroll-all-the-loopies -fomit-random-instructions

  22. Re:Article Text (Slashdotted Server) by Fedallah · · Score: 0, Offtopic

    I think the other guy beat mine by a few seconds. My post should be modded down redundant accordingly. I'd rather only see one copy anyways.

  23. behind the XCode curtain by pohl · · Score: 4, Insightful

    This is cool...I learned something on slashdot today. On a hunch I got a bash shell on my OSX box at home and typed "dist--", and lo there be distcc already installed and ready to go. That must be what they use for distributed builds in XCode

    --

    The "cue the foo posts in 3, 2, 1..." posts will commence with no subsequent foo posts in 3, 2, 1...

    1. Re:behind the XCode curtain by Anonymous Coward · · Score: 5, Informative

      Yup, look at the X code preferences for distributed builds. The cool part is they use Rendezvous to automatically find machines to send work. You can set your box to use these others and/or offer service to others. Also on dual processor boxes is will treat them as two machines and do two compiles at once.

      Anyway, you can see distcc running when you have X code enabled for distributed builds and running.

      --jim

    2. Re:behind the XCode curtain by jcr · · Score: 4, Interesting

      Yes, it is. This was described in the XCode session at WWDC last year.

      I had a project that took about 15 minutes to build on my Dual G4. I turned on distributed builds in XCode, and it dropped to 2 minutes. Turns out that about a dozen of my collegues on my subnet are running the same build of our developer tools as I am.

      distcc rocks.. Whoever thought it up should get the appropriate "special award for extreme cleverness."

      -jcr

      --
      The only title of honor that a tyrant can grant is "Enemy of the State."
    3. Re:behind the XCode curtain by marklark · · Score: 1

      If you check apple.slashdot.org, you'll find a discussion of XGrid Technology Preview 2 - still a low bandwidth discussion. :^)

    4. Re:behind the XCode curtain by Anonymous Coward · · Score: 0

      Hey, that makes me wonder if that would work with Darwin too, so that my faster x86s could compile stuff for my iBook (and use Rendesvous to do it!)

  24. Mirror by after · · Score: 2, Informative

    The article is loading really, really slooooow, I was able to get a html-only copy of it.

  25. Re:Martin Pool interview - clickable link by Poisonous+Drool · · Score: 4, Informative
  26. Re:Article Text (Slashdotted Server) by Anonymous Coward · · Score: 0

    You're right. This should have been modded troll. Posting an article should never be modded up and when posted using a user account (i.e. Karma Whoring) it should be modded down. Karma whoring should never be rewarded.

  27. Would using a recent version of distcc... by Saiyine · · Score: 0

    ... improve the compile time?

    I just can't understand how him goes through all that hassle and then uses distccknoppix, a VERY outdated livecd, with last version dated from July 3rd 2003?!?!?

    --
    Hosting 20G hd, 1Tb bw! ssh $7.95
  28. works great for Gentoo by Chuck+Bucket · · Score: 1

    current running on 3 boxes at home, make compiles fun, since the other machines get to help out. here at work it's another story, beefy Linux boxes in the server room help me out when I need to do emerges for my workstation or server. it's all good fun.

    CB

  29. Improving builds. by Anonymous Coward · · Score: 2, Informative

    (1) Use Scons
    (2) Use --jobs=2 (or however many processors you have).

    Build times will be greatly improved - and it's cross platform as well.

    In my opinion - especially if you have a complicated project - distcc isn't worth it. The machine takes so long pre-processing everything (including header files) - that you loose whatever advantages you might have with offloading the actual compilation work. It's especially useless with MSVC once you start using precompiled headers.

    1. Re:Improving builds. by Anonymous Coward · · Score: 0

      lose/win
      loose/tight

      I'm not a language nazi, but this is way too common around here. Geeks are supposed to be smart, guys. (And Unix geeks are supposed to be anal-retentive)

  30. Re:Article Text (Slashdotted Server) by djh101010 · · Score: 0, Redundant

    If it was posted _seconds earlier_, then how could he know he was being redundant?

    The purpose of that tag is to say "Yes, we know, people keep making that point over and over, please read the thread". In this case, neither applies.

  31. Re:Article Text (Slashdotted Server) by djh101010 · · Score: 2, Funny

    Are there actually regular participants of Slashdot whose karma _isn't_ listed as excellent? If that's the case, how can "karma whoring" come into it at all? It's not like you get promoted to "excellenter" or "excellentest" or something.

  32. Or... You could do it properly. by Moderation+abuser · · Score: 4, Informative

    Install Sun Grid Engine[1] since it's free and now open source and then not only do you get qmake for distributed builds but you also get a general purpose distributed processing system. And hey! It even has the current buzzword "grid" in the title so your PHB will love you.

    [1] http://gridengine.sunsource.net/

    --
    Government of the people, by corporate executives, for corporate profits.
    1. Re:Or... You could do it properly. by Atzanteol · · Score: 1

      I highly doubt that Sun's 'Grid Engine' is as easy to install and use as distcc is though. Though it does sound interesting.

      --
      "Ignorance more frequently begets confidence than does knowledge"

      - Charles Darwin
    2. Re:Or... You could do it properly. by Moderation+abuser · · Score: 1

      Grid engine's a doddle to install and use. It's also more useful than a system limited to running distributed compiles.

      --
      Government of the people, by corporate executives, for corporate profits.
    3. Re:Or... You could do it properly. by Anonymous Coward · · Score: 1, Informative

      I have never tried qmake on grid engine, but it is horibly slow in schedualling task to run on our cluster even with limited amounts of jobs in the queue. I wonder what the performance is like compared to distcc. If you are performing nightly builds that take 10 hours to complete then it is no problem. But if you are doing interactive building where you are expecting under 5 min builds the the schedualing could be a problem.

    4. Re:Or... You could do it properly. by Spunk · · Score: 1

      Grid engine's a doddle to install and use

      doddle... Is that a good or a bad thing?

    5. Re:Or... You could do it properly. by cant_get_a_good_nick · · Score: 1

      British slang.
      s/a doddle to/a piece of cake to/g

    6. Re:Or... You could do it properly. by Moderation+abuser · · Score: 1

      Maybe you need to send your sysadmin on the sysadmin course. It's free:

      http://suned.sun.com/US/catalog/courses/WE-1600- 90 .html

      Our grid gets jobs out to an execution host and started in less than a second. All of our applications are distributed out over the execution nodes; Editors, word processors, spreadsheets, The Gimp, software builds, *everything*.

      In fact, the less than 1 second latency incurred submitting a grid job is easily and by far overcome by the reduction in time given by starting a process on a machine which has an already running instance of that application. Netscape; sub 3 seconds, Open Office; sub 4 seconds. Emacs; only 50 or 60 seconds. You can achieve this by managing the queues.

      We use the grid interactively for everything.

      But then... I make a point of knowing what I'm doing.

      --
      Government of the people, by corporate executives, for corporate profits.
  33. Why wasn't a factorial experiment used? by alptraum · · Score: 4, Informative

    Sigh, another experiment that could have benefitted greatly from factorial experimentation. If your unfamiliar with DOE, here is a basic introduction courtesy of NIST:

    http://www.itl.nist.gov/div898/handbook/pri/sect io n1/pri11.htm

    It appears in this case we have a variety of factors and trying to, in this case, have a response of "elapsed time" for compilation and it is a minimization problem. Instead of looking at factors individually, a factorial DOE would have allowed interactions to be analyzed and to look for a global optima rather than just optimizing individual factors and then tossing them all together, it doesn't work that way a lot/most of the time.

    If the author of this article is present: Why wasn't a factorial experiment used?

    1. Re:Why wasn't a factorial experiment used? by Anonymous Coward · · Score: 0

      Maybe because he's a system administrator and not a research scientist?

    2. Re:Why wasn't a factorial experiment used? by alptraum · · Score: 2, Insightful
      Maybe because he's a system administrator and not a research scientist? That isn't a reason, you use the best solutions that are available for solving a problem type regardless of your position.

      DOE is widely implemented in especially manufacturing processes, however with just basic knowledge of DOE it is easy to see the applications to non-manufacturing processes as well. DOE is readily available in just about any statistics software worth using, R, SAS, Minitab, S-Plus, etc so even if you don't have much formal knowledge they are very easy to implement.

      Honestly, doesn't anybody pay attention in statistics classes? I mean for goodness sakes, I saved a major semiconductor producer millions by solving a problem that none of their engineering staff knew how to solve, and all I did was 1-way ANOVA, you have no idea how long I laughed over that one. 1-way ANOVA, that's covered in like first semester basic stats that engineers and physical scientists take, but from what I've heard atleast my school vast majority just blow the class off, and then they get their ass handed to them when their out in the real world by the stats guys, don't let it happen to you.

    3. Re:Why wasn't a factorial experiment used? by Anonymous Coward · · Score: 1, Funny

      Stats... right, I'll add that to my enormous list of "things other people think is incredibly important and that I really ought to know because otherwise I am an indisputable dope." One day, I'll wipe my ass with this list.

    4. Re:Why wasn't a factorial experiment used? by DarkMan · · Score: 3, Interesting

      Probably because it wasn't needed. And secondly, factorial DOE isn't as good as your implying it to be.

      Factorial DOE is useful if you have multiple measurable, continious or quasi continous [0] factors, and want to optimise - particualry when there is some trade off. In this case, however, most of the variables that were altered were clearly discrete (This version of make, or that version of make, for example), or it was clear that the optimum was at an extreme (More CPU speed is always good, for example).

      So, the factors I can see that would be suitable to a factorial DOE is the number of machines in the farm. Except, each machine is different, so that's effectivly an n-dimensional set, with 2 options on each dimension, for n machines. If your going to do the stats, you'd want to do them properly, so no handwaving them all together there.

      Plus, this is a determanistic situation. There is no real need for empirical analysis - you can do it all from first principles, which would be much more efficent, I think. And, indeed, that's what the author did - by looking at the theoretical background of it all, to use different makes and so on, to optimise.

      Finally, if you think that a factorial DOE will get you a global optimum solution, then your sadly mistaken. It's a good procedure for optimising, and it can avoid some local minima - but it's not guarenteed to find a global minima. The only guarenteed method I'm aware of is a synthetic annealing - and if you've got a faster method, I, and a large number of people doing numerical caluclations, would love to hear it.

      Oh, and the aim here was _not_ to find a global minima. It was to get something that was good enough. Trying for better than that is wasted effort.

      [0] For example, the set of integers, from 0 to 1000 is quasi continous. It's not really continous, but it's close enough for real purposes.

  34. Electric Cloud by Anonymous Coward · · Score: 3, Informative

    Yes, distcc is nice, but anyone with a really big build (say like hours long) must take a look at John Ousterhout's company Electric Cloud (yeah, John Ousterhout as in Tcl) here. They've built this replacement for gmake that runs the jobs in parallel but is smarter than distcc because it can break open all the recursive makes and run _everything_ in parallel and it works cross platform too. It's $$$ and not OSS :-) but designed to be ultrareliable.

    1. Re:Electric Cloud by boots@work · · Score: 1

      Or just install SCons. Great parallelism, smarter rebuilds, cross platforms, more reliable than make, and far faster than automake/libtool. Works well with distcc too.

      run _everything_ in parallel

      What, even things that shouldn't be parallel? Screw that.

      Damned if I'm letting an electric cloud near my machine room.

  35. Re:Article Text (Slashdotted Server) by Anonymous Coward · · Score: 0

    perhaps you should look slightly above this particular report of the article (Which was marked redundant) and see that it *IS* redundant, considering there is another post with the exact same article that was posted before it...

  36. Re:Article Text (Slashdotted Server) by Anonymous Coward · · Score: 3, Informative

    You must be new so I will explain it to you. There is a class of users that regularly Karma Whore and get their Karma maxed out and then proceed to burn Karma by posting things like goatse.cx at +1. They do this not only to annoy people but to prove the flaws with the moderation system. While this guy may not be one of those people it is important to not reward somebody for posting an article. Users can easily post the article anonymously and avoid this issue altogether.

  37. PHP article? by Vellmont · · Score: 3, Insightful

    If you knew you were going to be slashdotted, wouldn't you link to a static version of the article instead of one running a PHP script?

    --
    AccountKiller
  38. Re:Article Text (Slashdotted Server) by Lord+of+Ironhand · · Score: 2, Interesting
    If it was posted _seconds earlier_, then how could he know he was being redundant?

    He couldn't. It's simply a risk you take when posting the article. The moderation system is intended to improve things for the reader, not to judge his (undoubtedly good) intentions. You have a point though, maybe Redundant moderations shouldn't decrease karma, just like Funny doesn't increase it.

    btw, posting the article as non-AC is viewed by many as karma whoring, so it's not recommended anyway.

  39. Re:Article Text (Slashdotted Server) by Anonymous Coward · · Score: 0
    Hey mods, a repost of text of a slashdotted site is not redundant, it's informative.

    The redundant mods are just being proactive. In a few hours when the site has recovered from the traffic spike, the repost will indeed be redundant.

  40. php huh by felix9x · · Score: 0, Offtopic

    He thought he could survive a ./ with a php generated page? That reduces the throughput by what 10 times.

  41. Sun's CC by Anonymous Coward · · Score: 0

    Guess this guy never heard of SUN's 'dmake' tool. He's only about 8 years late.

  42. Re:Article Text (Slashdotted Server) by Anonymous Coward · · Score: 1, Funny

    Please, people, please, will you keep redudantly repeating what the definition of redundant means over and over again several times. It still hasn't sunk in yet. It still hasn't sunk in yet. The redunant part that is. By which I mean the definition of the meaning of what the word redudant means.

  43. Re:Article Text (Slashdotted Server) by Fedallah · · Score: 1

    Because I was logged in, and didn't even think about hitting the 'Post Anonymously' button. I don't care about Karma, as alien a notion that may be. Hence why I don't care if this gets modded Redundant, Informative, or Troll. I was simply trying to enable people to have an on-topic conversion when the original source of information was unavailable.

    Apparently I should know better since I have a low userid, but to be honest, I don't post much, and I didn't know better, so I apologize.

  44. Re:pedantic asshole comment by Anonymous Coward · · Score: 0

    Don't you have any thing better too do with you're time then too point out grammaticle misteaks and mispellings? It will effect nothing and its boring.

  45. Missed the best point by MerlynEmrys67 · · Score: 4, Informative
    He completely ignored the usage of distcc and ccache together. The pair of applications make for a huge win.

    There are some problems though - which do you do first ccache or distcc (answer on my benchmarks is ccache - if it isn't in the cache send it on the network) how fast is your "build" machine - this is critical. The build machine is resonsible for preprocessing the file, checking if it is in the cache and then sending it out to be turned into an object. Especially when you interact the results of ccache (which most of your builds are just the same file over and over - very few "changed" files) and distcc - most of your time is spent in the first pass compiler.

    In our environment we had boatloads of dual XEON machines around - they made wonderful build machines, and it didn't hurt that we connected them with Gig Ethernet either. Did wonders for our build times.

    Over all distcc and ccache are wonderful tools that should be in every large compile environment - making compiles that used to take days take simple minutes. But you want to make sure that the dependancy between ccache and distcc work optimally in your environment.

    --
    I have mod points and I am not afraid to use them
    1. Re:Missed the best point by IceFox · · Score: 2, Informative
      He completely ignored the usage of distcc and ccache together. The pair of applications make for a huge win.

      Actually I mentioned it in the first paragraph...

      --
      Do you changes clothes while making the "chee-chee-cha-cha-choh" transformation sound?
    2. Re:Missed the best point by MerlynEmrys67 · · Score: 1
      Yeah - his mention was he was ignoring it...
      there is another tool called ccache, which is a caching pre-processor to C/C++ compilers, that I wont be discussing here. For all of the tests it was turned off to properly determine distcc's performance, but developers should also know about this tool and using it in conjunction for the best results and shortest compile times.

      Seems to me - he is ignoring the hard part of getting the best benefit out of the tool package... Kinda like talking about optimizing c code before talking about optimizing algorithms

      --
      I have mod points and I am not afraid to use them
    3. Re:Missed the best point by IceFox · · Score: 3, Insightful

      In the first paragraph I mention that you should use it and be familiar with it. Assuming that you already do use it, then the rest of the article applies about how you can improve a certain portion of it (distcc). You don't ignore all the books on optimizing C code just because there are plenty of algorithm books do you?

      -Benjamin Meyer

      --
      Do you changes clothes while making the "chee-chee-cha-cha-choh" transformation sound?
    4. Re:Missed the best point by Minna+Kirai · · Score: 1

      Actually I mentioned it in the first paragraph...

      The point is that to a person unfamiliar with "compiler-intermediary" tools like distcc and ccache, the way to use them simultaneously is nonobvious.

      Does the master host keep the cache, and farm out jobs on cache misses? Or does each box keep its own ccache, which is used to fulfill compilation jobs from the master? (Obviously, one of those options is drastically worse than the other)

      Since you alluded to the possibility of distcc+ccache in the introduction, it is a disservice to your readers not to at least give a pointer to instructions on how to run them in tandem. At minimum, include this link. (And even better, add a disclaimed factoid on what your 6 minutes goes down to when ccache is in the mix)

  46. Re:Article Text (Slashdotted Server) by Fedallah · · Score: 1

    Lord of Ironhand, thanks for the polite bit of info about this possibly being viewed as karma whoring.

    I appreciate it much more than those simply accusing me of karma whoring.

  47. Perfect timing! by rice_burners_suck · · Score: 1
    This story arrived with perfect timing, as I just finished reading the one about "Build From Source vs. Packages?", and there was some discussion about distcc and Gentoo there. It got me kind of interested, so I thought I'd look into it a bit more, and then this story arrived!

    Hell yeah!

    1. Re:Perfect timing! by hyc · · Score: 1
      Of course, I wrote about doing this in my presentation to the Sun User Group in 1991 "GNU & You, Building a Better World" which I developed while working at JPL. And yes, I wrote the jobserver code that allows gmake to spawn parallel jobs without swamping the machine, the way the old loadaverage-based code did.

      The motivation for my work in 1991 was not much different than this, although back then my problem was building the X11 distro, and all of the imake crap that was in there. Since the paper itself is no longer online, a brief summary:

      • imake sucks. In the X11R4 distro, imake-generated Makefiles accounted for 11% of the disk space consumed by the source tree.
      • using extra CPUs for compilation is goodness.
      • Anybody with half a brain can write a better Makefile than imake produces, they just have to be bothered to do it.
      • Don't use for-loops to spawn recursive make's in subdirectories when there are no serial dependencies betwen those directories. This creates serialization points that bottleneck a parallel make needlessly.
      • On the flip side, don't write dependencies in a flat list when one element of the list depends on an earlier item in the list. When a parallel Make comes along and spawns them all of simultaneously, things will break.
      • In other words, always write your Makefiles such that the written dependencies reflect reality. Don't write
        • a: b c d
        if 'd' depends on 'b' or 'c'.
      Nice to see that people are finally catching on, after only 13 years.
      --
      -- *My* journal is more interesting than *yours*...
  48. Wow... by Anonymous Coward · · Score: 0


    And to think, all I ever did to speed up my compile times was to add "-j 4" to get the other CPU's to work. Crazy.

    steve

  49. jobs/cpu? by swebster · · Score: 2, Interesting
    Try putting your localhost machine first in the list, in the middle and at the end. Normally you want to run twice the number of jobs as processors that you have. But if you have enough machines to feed, running 2 jobs on the localhost can actually increase your build times.
    About the "Normally you want to run twice the number of jobs as processors" part... is that really true? I thought it was best to just run 1 job/cpu by a long shot. Am I confused or is he?
    1. Re:jobs/cpu? by raodin · · Score: 1

      I've always heard # of cpus + 1.

    2. Re:jobs/cpu? by vadim_t · · Score: 2, Insightful

      It seems to help a bit actually. Probably due to the idle times caused by disk I/O. While one job is reading/writing from the disk the other one can compile, for example.

  50. "Weak" computers are usefull ... by Hektor_Troy · · Score: 1

    Maybe the mini-ITX cluster would come in handy for an additional *umph* with your large compiles? If they support PXE, you wouldn't even need the cd's.

    --
    We do not live in the 21st century. We live in the 20 second century.
  51. whoa yours server's been comprimised.. by DR+SoB · · Score: 1

    Dude, check out your server, it's been hacked...

    --
    Mod +5 Drunk
    1. Re:whoa yours server's been comprimised.. by voidptr · · Score: 1

      Nah, the sysadmins are just being pricks.

      --
      This .sig for unofficial government use only. Official use subject to $500 fine.
  52. I don't have long compile times by Anonymous Coward · · Score: 0

    I don't mean to flame, and I would definitely agree that C/C++ have been enormously successful. However, I think it is sort of odd and pathetic that a programming language does not get better while other technologies do get better. I develop with both Visual Studio and Borland's Delphi. With Delphi the compile times are fast, the language is clean, and the VCL helps productivity. I find it amazing that geeks are not hailing that fine piece of software. It's component oriented approach to development is unmatched.

    With Visual Studio the precompiled header technique really does speed up compile times. However, it is such a pain in the ass to use it correctly that I ultimately turn it off. Research on the newsgroups also show people suggesting to bite the bullet and turn it off. The preproccessor is still an integral part of the language. What? So all you geeks like to trash every closed source technology from a company but have no problem with a language that uses the preproccessor as a crutch because it is lacking in language features?

    I recall reading about how KDE needed something for the linker to address slow load times of images, I may have botched that up since I can't recall correctly. My point is this: yet another thing that is an issue with C++.

    Not to go off topic but long compile times is not the only issue. Templates suck. I have really tried, really really tried. First and foremost, at the time I was interested in templates most compilers didn't support many of the features of the C++ standard. I admit that this has gotten better now. I have revisited templates, specifically the parametric programming, or metaprogramming paradigm. I have worked in image processing projects and I think that paradigm would be great for numerical processing projects in general. However, for application stuff the metaprogramming paradigm is WAY too dificult, partly because of the C++ syntax.

    And one last thing, I find that "templates are fast because of blah blah" to be a bunch of bullshit. The argument goes something like this: vector of TFoo is better than Delphi's TList of pointers to TFoo because the vector is the type and you don't have to derefence the pointer or do type casts. I agree that the type safety is nice, but I have runned a variety of tests and Delphi's TList of pointers is not slower than a vector. In some cases TList may perform faster. So don't give that shit that C++ is better than Delphi because Delphi "must be slow cause it's a RAD tool". I not only get fast compile times, I also get a pretty darn good optimizing compiler. I have worked on enough projects that I can say that Delphi's optimizer is on par with the Visual C++ optimizer. (side note: both of them get beat by the Intel compiler, but I don't get to use that for work :-(

    I guess what I am trying to say is this, if all these utilities have to be made for C++ then isn't it a sign that C++ may need some reworking? Don't get me wrong, there are some things I like about C++, but it has become stagnent.

    Sincerely,
    Content C++ programmer / Happy Delphi programmer / Disgruntled MFC programmer

    1. Re:I don't have long compile times by Kupek · · Score: 1

      Compiling time isn't an issue for you because your programs aren't large enough. You have no need for this tool. There will always be projects that take a non-trivial amount of compiling, no matter what language or technology you're using.

    2. Re:I don't have long compile times by Anonymous Coward · · Score: 0

      All I can say is - try building Open Office from source...

  53. Re:Article Text (Slashdotted Server) by djh101010 · · Score: 1

    So, then those are the types who get modded into oblivion. Self-correcting problem. To mod down a legitimate post (text of an unreadable site) because some of the people who do that might then post the goatse link, well, isn't all that realistic. (the fact that the goatse site is down notwithstanding).

    Perhaps you're attributing motivations to this behavior (making a useful post) that doesn't apply.

  54. love at first compile by jeet · · Score: 0

    Thanks for posting this article.. I was discussing the long compile time with my friends today and we were dreaming of getting some 16 processor machine dedicated for compilation..

    I saw this article and downloaded distcc.. It was love at first compile. Installation and first test run took 30 seconds (as told on site)... Will look at your optimizations now :D

    Now I am planning to try ccache also.. but lemme take one step at a time. :D

  55. Can distcc model be used for other apps? by -tji · · Score: 1

    Is distcc integrated into the compiler components, or is it another layer below gcc, which divides up tasks?

    If it's generalized, it would be cool to see it used for other CPU intensive tasks.. Video processing comes to mind. I would love to have a cluster bring down the times needed to:

    - Convert MiniDV home video to MPEG2 DVD's. There are professional tools to do this.. A hobbyist tool that could do clustering would be excellent.
    - Convert HDTV captures to MPEG2 for DVD archival. 1080i video processing involves some heavy number crunching.. downconverting a program for DVD archival takes hours of processing. Throw a few fast CPU's at it, and it could be done in real time.. This would make a nice back-end app for an HDTV PVR. You could take a 9GB HD program, and bring it down to 2GB.. making your PVR storage space last a lot longer.

    1. Re:Can distcc model be used for other apps? by lisany · · Score: 2, Informative

      What really happens is that you can use the so-called "masquarading" method installation, which basically means you set up symlinks called gcc, g++ and whatever to the distcc binary. Prefix your PATH with this directory and calling `gcc` will work.

      In my opinion this is easier (and better) than doing `make CC=distcc gcc`

  56. Re:Article Text (Slashdotted Server) by GameMaster · · Score: 1

    Whether he knew it is was a re-post is irrelevant. The fact is that it was a redundant post. Marking as being such means that, ideally, duplicate copies of the same info don't show up on my Slashdot thread. Having the text of the original article mirrored is useful, but having it mirrored multiple times in the same story thread just adds more useless clutter to be sifted through.

    -GameMaster

    --

    Rules of Conduct:
    #1 - The DM is always right.
    #2 - If the DM is wrong, see rule #1
  57. Re:Article Text (Slashdotted Server) by Anonymous Coward · · Score: 0

    My karma is -1, and it got that way by making fun of people like you :)

  58. Re:Article Text (Slashdotted Server) by Anonymous Coward · · Score: 0
    the fact that the goatse site is down notwithstanding
    how would you know that the goatse site is... nevermind

  59. Re:Article Text (Slashdotted Server) by djh101010 · · Score: 1

    how would you know that the goatse site is... nevermind

    Read about it on slashdot, oddly enough.

  60. Kudos to Martin! by Anonymous Coward · · Score: 0

    Martin made it much easier for me to come out. When I ran across his mailing lists and found how casually he could joke about these things, and how nobody else seemed offended or attacked him for it, I was floored.

    Say what you will about the open software community. Some people may be hot tempered, some may be exclusionary or quick to criticize, but I've yet to find a group so willing to accept people from all walks of life.

    Thanks to more than Martin and OSUG. Thank you to everyone for making open source a true open community!

  61. Can it be hacked to support MS VC++ on Win32? by Anonymous Coward · · Score: 0

    Inquiring minds (and those that can't afford IncrediBuild) want to know...

  62. Recursive Make Considered Harmful by JWhitlock · · Score: 5, Informative
    There was an interesting paper by Peter Miller in 1997 called "Recursive Make Considered Harmful". It makes a good case for why recursive make is a bad idea, slowing down compile times and clouding dependancies. Benjamin Meyer has proved the point again, with his use of unsermake - if you generate a non-recursive make, then distributed compiles are twice as fast.

    Unfortunately, the makefile creator most people use, automake, creates only recursive makefiles. Maybe a replacement like unsermake will get automake developers thinking about radical changes. I wouldn't mind seeing M4 go away, for one.

    1. Re:Recursive Make Considered Harmful by ewhac · · Score: 3, Interesting

      Seconded.

      When I was at Be, Inc. (RIP), one of our engineers, motivated largely by the above-referenced article, converted our entire build environment to a non-recursive structure using gmake. The result was a large speedup, as well as more effective use of multiple processors (which BeOS utilized very well). gmake would grovel over the build tree for a minute or two, then launch build commands in very quick succession. 'Twas great.

      Schwab

  63. How do you do all of this? by Xabraxas · · Score: 1

    I use distcc and sometimes it doesn't seem to help because I try to offload my compiles to my two slower computers first because I would rather keep my laptop cpu cooler. The problem is that sometimes it will actually take longer to compile. After reading about unsermake I really want to use it because, I think automake is my bottleneck. The question is how do you do it? Where can I find unsermake and how do I configure distcc to use it? The article is great on explaining what to change but not how to change it.

    --
    Time makes more converts than reason
    1. Re:How do you do all of this? by Xabraxas · · Score: 1

      I'm a dumbass. The links are at the bottom of the page.

      --
      Time makes more converts than reason
  64. why not use openmosix? by Anonymous Coward · · Score: 0

    Then you can parallelize any process that
    is not serial in nature.

  65. Scaling by Capt.+Beyond · · Score: 1

    Sure distcc might be good for a few machines, but it doesn't scale well. Trolltech's Teambuilder is much better suited for large scale distributed development environment. Ask Cisco. They evaluated both distcc and Teambuilder on huge multi processor solaris systems. Guess who they chose, as it scaled better. That's right! Trolltech's Teambuilder! Plus, Teambuilder is much easier to setup, and has very nice monitor to monitor your compile farm. Teambuilder

    --
    -- "Perceptions create reality. By changing your perceptions you change your reality."
    1. Re:Scaling by boots@work · · Score: 1

      Got a link for that, little troll? I don't see anything on the cisco, distcc or teambuilder sites.

    2. Re:Scaling by IceFox · · Score: 1

      Well for joe shmo developer who has three computers distcc is "free" and is easy to set up. Little incentive to choose anything else.

      -Benjamin Meyer

      --
      Do you changes clothes while making the "chee-chee-cha-cha-choh" transformation sound?
    3. Re:Scaling by Capt.+Beyond · · Score: 1

      well for joe schmoe, teambuilder comes in a personal edition allowing up to 3 computers.

      --
      -- "Perceptions create reality. By changing your perceptions you change your reality."
    4. Re:Scaling by gr8_phk · · Score: 1

      Did you RTFA? I would guess not. He discusses scaling at length and concludes that 20 machines can be nicely utilized. His initial attempt with distcc took 45 minutes, but he got it down to 6. He specifically talks about shortcommings that will limit the benefit to the first few machines and explains how to overcome that and make effective use of many more.

    5. Re:Scaling by Capt.+Beyond · · Score: 1

      Obviously Cisco tested both extensively. It's their conclusion, not mine.
      Using Teambuilder, you don't have to muck about with heaps of settings, trying to discover which one works best, it just works. Out of the box.

      --
      -- "Perceptions create reality. By changing your perceptions you change your reality."
    6. Re:Scaling by Anonymous Coward · · Score: 0

      Hey Cisco fanboy - who cares?
      Distcc works very well and it is free.
      I don't care to hear what a multi-billion dollar company uses.

    7. Re:Scaling by Zigg · · Score: 1

      Until he gets his electric bill, that is.

  66. Erm... by lisany · · Score: 1

    Two things...

    First, 10Mbit is plenty of bandwidth unless you're on a wimpy hub (people still use them). Get with the times and get a switch, it'll likely be 100Mbit anyways. Turning LZO on for 10Mbit may help, but the majority of the compile cycle (preprocess-send-compile-receive-link) will be spent doing compiling work (preprocess-compile-link), not sending and receiving. See the next paragraph for more...

    Second, I must wonder if there is a diminishing returns effect with the addition of machines (especially the lower end models). I question this because increasing the number of jobs will add load to the master server with the preprocessing and linking. Not that with a beefy server this isn't a problem - but seriously, who has machines like that? :)

    1. Re:Erm... by Anonymous Coward · · Score: 0
      Second, I must wonder if there is a diminishing returns effect with the addition of machines (especially the lower end models). I question this because increasing the number of jobs will add load to the master server with the preprocessing and linking.

      I wouldn't think so. If you are compiling N source files and linking them into M executables, then you have to do N preprocessing steps and M linking steps, period. It doesn't matter how many machines you are sending the results to; it's just a function of the number of source files that you are compiling.

      But yes, there is an upper limit on performance. If you have an infinitely fast remote machine to pass everything to, you still have to do the same fixed amount of preprocessing, so you get into something like Amdahl's Law: if your task if broken into steps A and B, and if you optimize step A a huge huge amount, then your execution speed only improves at most by a factor of (A+B)/B. For example, if initially A and B take equal amounts of time, then if you optimize A massively and improve it several orders of magnitude, you're only going to double your speed overall even though you did a really awesome job with half of it. :-)

  67. other alternatives by Anonymous Coward · · Score: 0

    One of the reasons I recently switched a 250,000 line program from C++ to java is compile time. I'm always surprised when I read performance reports comparing C++ to java that this is rarely mentioned.

    Here (not India, mind you) a programmer costs so much more than a machine, that optimizing compilation times makes us far more cost effective. This isn't the solution for everyone, of course, but it has worked well for us.

    distcc was brought in at one point as the solution to our problems (the 10th tool on a long list of panacea's that would fix our 2 hour C++ compile time).

    But in the long run, we rewrote things in 500,000 lines of java, which still compiles in sub one minute.

    Have others had more luck with distcc than we did?

  68. network boot by Anonymous Coward · · Score: 0

    Be even better if each slave box in the farm started with a network boot, losing the CD drives would save a serious amount of space at a small cost of hammering the network at startup. CD drives have been the least reliable part of my machines since their invention.

  69. distcc isn't so great by Lord+Ender · · Score: 2, Informative

    My roommate and I both use Gentoo. We also both have AthlonXPs. When we first turned on distcc, cutting our compile times in half, we were overjoyed. But then random compiles started failing. Not until I turned of distcc could I get some packages to compile. The point is, distcc isn't flawless.

    --
    A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
    1. Re:distcc isn't so great by Anonymous Coward · · Score: 2, Informative

      At work, we have a bunch of gentoo boxes and discovered the same thing as you. Emerges would randomly fail. We were hoping we could figure out why, but no luck. We had to pull distcc off the boxes.
      If I had more time I would trace through things and try to figure out why they failed. But I don't have that much time.

      I still like the idea behind distcc and hope that someday (soon) they'll get it working correctly.

    2. Re:distcc isn't so great by KFK2 · · Score: 3, Informative

      Well.. here goes a couple of mod points that I spent.. but I'd thought I'd chime in..

      My friend recently had the same thing happen, and the conclusion we came to was that the compiler versions were different on the distcc servers (3.2.2) versus the client (3.2.3).. and the preprocessed code being sent off had syntax erorrs or something of the like when it was sent off (something to do with one of the new options in the latest gcc). I don't recall exactly what option it was or what package(s) were failing... but I do know that somewhere there was an 'if gcc-version 3.2.3 then add some options to CFLAGS' (maybe /etc/make.globals? or make.conf).

      This is one of the biggest things I have found with distcc.. compiler versions have to be pretty similar.. usually even the incremental version changes affect the compiles..

      I've not had any problems with using distcc, both with compiling Gentoo packages, along with my own projects..

      Kenny

    3. Re:distcc isn't so great by Anonymous Coward · · Score: 0

      Since you're getting spurious errors, I would bet that you are using a gentoo-kernel with the fast checksum patch: it corrupts data in burst mode, this is not a distcc failure.

    4. Re:distcc isn't so great by feronti · · Score: 1

      Compiler versions should be identical. If the compilers generate different code for the same preprocessed text, weird, bad things can happen, even if compiling and linking succeed, due to one version optimizing a function call one way and the other version optimizing it in a slightly different, and incompatible way or other similar errors. You might be ok if you compile with optimizations turned off, but you're not going to do that for production code.

      If you're running gentoo, and doing an emerge -u world, check to see if there's a toolchain upgrade in the mix (gcc or binutils). If there is, then run the toolchain upgrade separately with distcc disabled, then proceed with the rest of the upgrade. That should reduce, if not eliminate all of your issues with distcc.

    5. Re:distcc isn't so great by KFK2 · · Score: 1

      I've personally not had any problems with distcc... I'm using RedHat, and Gentoo boxes as my servers; gcc versions:

      gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)
      gcc version 3.2.3 20030422 (Gentoo Linux 1.4 3.2.3-r2, propolice)

      linking on the client with GCC 3.2.3 (same as Gentoo above).. I've never had any problems, but like I wrote.. when my friend updated to GCC 3.3 he had a bunch of problems...

      From what I've read, as long as the binary output is in the same format (symbols, calls etc) there should be no problems.. The problem arose because the 3.3 version had an option that added code that wasn't supported on 3.23... once we removed the option it worked fine (IIRC)

      Kenny

    6. Re:distcc isn't so great by feronti · · Score: 1

      Now that I've checked where I read what I said (the distcc FAQ) it would appear my original post was a little more paranoid than necessary; according to the FAQ, gcc 3.x.y and 3.x.z should be mixable. Of course, it doesn't hurt to make sure that the gcc versions match exactly, just to be safe.

      Some of the problems caused by mixing gcc versions, at least from my understanding, may not actually be caught at compile time, and you could end up with binaries that exhibit "strange behavior" with no apparent cause in the source.

  70. distributed codebase by Doc+Ruby · · Score: 2, Interesting

    It would be cool to use a distcc client which took my local code diffs, distributed them around the Internet, patched the distributed "standard" version, cc'd the code, and sent back binaries to my client. Crypto hashes against the revised code could ensure that I was really getting binaries from my actual uploaded diffs. But then everyone with "difstcc" would be recompiling so much that we'd each return to our original CPU bandwidth ratios :).

    --

    --
    make install -not war

  71. OpenMosix + make -j # by Anonymous Coward · · Score: 0

    I prefer using a few machines running OpenMosix kernel patches (say, using ClusterKnoppix) and just running my make with the -j switch. I guess the benefit of distcc is you can do everything in user land (not that the guy in the article does).

  72. When to use distcc and ccache by xixax · · Score: 2, Informative

    I went to a talk about these two tools, and getting the most out of them depends (to an extent) on knowing the nature of your compile. For example, if you are working only only a small part of a project comprised of many objects, you will probably benefit from ccache more than from distcc (in that only those objects affected by your code changes are rebuilt).

    On the same tack, the performance of distcc will (to an extent) depend on the nature of the compilation task used in the test (I am not familiar with kdelibs).

    --
    "Everything is adjustable, provided you have the right tools"
  73. Plug for Xcode... by boola-boola · · Score: 2, Informative

    While we're on the business of discussing distcc, I've gotta say... Xcode supports it quite nicely (including the pretty GUI distcc Monitor), and _ALL_ it takes is checking two boxes in the preference panel. I'm serious.

  74. Apple gcc on linux by rillian · · Score: 0, Flamebait

    I was really happy to see Apple used distcc for their distributed compiles as well.

    Too bad it's so hard to build their gcc on other OSen. My linux desktop is much faster than my macos laptop, and it would be nice to be able to add it to the pool.

    Anyone have pointers on porting apple's toolchain?

  75. Notice: Compiler Failure by nateb · · Score: 1
    Having fallen in love with distcc and its ability to speed up compiling (insert anyone who compiles like Gentoo users or Linux developers).

    gcc: Error, invalid substitution.

    gcc: Error, Syntax Error.

    make: make failed, compiler returned -1 (Error)

    --
    -- Nate
  76. Re:Article Text (Slashdotted Server) by naasking · · Score: 2, Funny

    It's not like you get promoted to "excellenter" or "excellentest" or something.

    You mean you haven't been promoted yet? Ha! n00b... :-)

  77. distcc equivalent for Java? by igbrown · · Score: 1

    I'm curious to know if there is a distributed compile or build system out there for Java. I've looked around a little, and have only found a few abandoned open source projects. I'm surprised that there isn't some distributed Ant-based tool yet. Anyone out there know of anything?

  78. pedantic comment by Anonymous Coward · · Score: 0

    Good stuff if you have a bunch of CPUs around. A test lab after hours maybe?

    But dude, 'then' != 'than'! Once might be a typo, but 14 times? That's something else.

  79. You're a bit outdated. by devphil · · Score: 1
    Unfortunately, the makefile creator most people use, automake, creates only recursive makefiles.

    And there's a damn good reason for it, too, but that's neither here nor there. Anyhow, this was fixed so you can do non-recursive stuff if you want to now.

    Unfortunately, the very latest automake versions are trying to be way, way too clever, thereby breaking stuff in lots of projects. Time to throw it out and use something else.

    I wouldn't mind seeing M4 go away, for one.

    Automake is a Perl script. It doesn't use M4. (Using M4 would make it more readable, ha.) You're thinking of autoconf.

    --
    You cannot apply a technological solution to a sociological problem. (Edwards' Law)
  80. icecream by Seli · · Score: 1

    It's strange the article mentions unsermake but not icecream, which IMHO beats distcc, despite not being officially called stable. Icecream has a scheduler and is therefore noticeably less stupid with distributing the builds (it happened to me repeatedly with distcc that it sent the first job to the most loaded node, which kind of sucks if you just changed one file).
    Icecream can be obtained the same way like unsermake, it's located in kdenonbeta/icecream.

  81. Question for ccache by master_p · · Score: 1

    I understand the usage of distcc, and it seems quite helpful. But what about ccache ? the available info does not say much, except that it "caches" the output in the following way: if the object files are already present, they are not compiled again.

    But I thought that the 'make' program does exactly that: if a source code file is newer than the object file, then the source file is compiled; if not, the current object file is used.

    What is exactly that ccache does that make does not ?

    1. Re:Question for ccache by DarkMan · · Score: 1

      make clean && make

      ccache will cache the previous compiles, and, if they haven't changed at all, use the cached results. This allows the certianty of a clean build to be gained in significantly less time. Make won't do that, because it was just cleaned.

      Additionally, I belive that ccache uses a global cache. So, if, for example, you are compiling a couple of linux kernals, each patched differently, some of the compilations will be the same between both trees. ccache will recognise this, and only compile once, and use the cached results the second time. This is impossible with make, as these are two different invocations of make.

      ccache is not something that everyone needs. But for those doing lots of development, it can really save alot of wasted time.

  82. Recursive Make Considered Harmful Considered Dumb by asuffield · · Score: 1

    That paper makes a spectacularly bad case. It provides no serious analysis to back up its wild claims, and mixes variables quite horribly. Approximately one page of the paper is spent talking about recursive makefiles, while the bulk of it is just about various ways in which people write bad makefiles, which apply equally to recursive and monolithic ones. It's like saying "We administered the drug to the patient, and then he was hit by a cruise missile. The patient died, so we have to assume the drug is not safe for human use".

    This is the sort of paper that would be rejected out of hand by any serious journal.

    Fortunately, most people just use automake rather than paying attention to this nonsense.

  83. If you want faster builds by DrSkwid · · Score: 1


    build smaller things

    the record for compiling a plan9 kernel is 15s

    I built & installed the kernel and the whole distributed userland in 45 mins on a Duron 800Mhz.

    --
    There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
  84. YOU FAIL IT! by Anonymous Coward · · Score: 0
    Sorry, no banana.

    This is slashdot. You're not supposed to say anything positive about Sun. Java isn't Open Source or GPL, after all.

  85. Mostly... by DarkMan · · Score: 1

    Yes and no.

    First off the generalised methods you allude to are MPI, the older PVM, and there's Mosix too.

    MPI and PVM are framework libraries that allow for code to be written to take parallelism into account. They tend to be used for numerics calculations (which was thier birthplace), simply because numerics are CPU bound. There are others, that are even more numerics centric (HPF - a Fortran varient, for example), but MPI should probably be the target of choice for new code, including non-numerics based calculations. Note that the term 'Beowulf cluster' implies MPI. The biggest point to note with MPI is that it allows for communication between nodes, and thus can be used for calulcations that are not trivially parralisable.

    Mosix is subtle different. It's a patch to Linux that distributed processes across multiple boxes. This tends to work better for jobs with long runtimes, as opposed to many smaller duration processes, however. For example, Ralphzilla>

    Parrallel applications need to be written to target MPI or PVM, whereas Mosix doesn't need special targeting. On the other hand, any multiprocessor aware application will be more efficent at using them than any automated solution. Still, Mosix may well be sufficent for most purposes.

    The downside to MPI &c is that they require libraries installed. Which raises dependancies, so most such applications tend to use thier own libs, built in. This is actually not a bad idea - in that it allows more specific tailoring to take place. On the other hand, MPI &c libs can be tailored for a specific set up (e.g. using non-Ethernet conectivitiy - such as a mix of Ethernet, Myranet and Papers, for example). That's a little out of the intended usage for distcc, however, where minimal set up times are desired.

    On the video front, then, transcode has a buildin cluster mode, for pretty much what you were talking about. Again, it's methods are all internal, but that's not an issue here.

    1. Re:Mostly... by Minna+Kirai · · Score: 1

      and thus can be used for calulcations that are not trivially parralisable.

      MPI is not what he wants. Both of the applications tji asked for are video recompression tasks. Those fall deep in the "trivially parralisable" category.

      Just split up the input file into megabyte chunks, allow each helper computer to convert one chunk, then concatenate the results on the master. There is no need for the helper computers to communicate amoungst themselves while the calculation is going on, which is the ability MPI enables.

    2. Re:Mostly... by DarkMan · · Score: 1

      Ur, he asked 2 things. One was about frameworks, the other was for an application. I covered both - see transcode mentions.

    3. Re:Mostly... by Minna+Kirai · · Score: 1

      Ur, he asked 2 things. One was about frameworks, the other was for an application.

      Ur, he asked one thing, and it was neither of the two things you "answered".

      Transcode is handy to mention, but MPI should not be considered as the path forward for problems of this class (which includes both compiling and video encoding)

  86. automake - unsermake by leuk_he · · Score: 1

    That is why he mentions unsermake as a automake replacement to parralel build files. This makes distcc scaleable over much more machines.

  87. Re:Recursive Make Considered Harmful Considered Du by Minna+Kirai · · Score: 1

    That comment makes a spectacularly bad case. It provides no analysis to back up its wild claims. Approximately zero lines of the comment has to do with the paper, which it essentially mischaracterizes.

    That paper makes a spectacularly bad case

    It makes a fine case. The worst part is that it exaggerates the value of its own minor insight. The grandiose title harkens to the famous "Goto Considered Harmful", which in its time was a more insightful position.

    Nobody should be surprised that globally correct choices cannot be decided with only locally correct data (for a non-greedy process, of course).

    Moreover, the actual problems caused by suboptimal makefiles pales in comparison to what havoc goto can wreak. Anything wrong with makefiles can be solved by Moore's law (wait for the hardware to get faster, so you can do full rebuilds quickly). But spagetti code makes it more difficult for programmers to work with software, and there has been no observed exponential growth curve of human intelligence.

    people write bad makefiles

    That's a cop-out. The Makefile system has turned out to be too flexible for most needs. Because the build system relies on authors of individual make, the behavior of different Makefiles can be completely different (they're arbitrary programs, after all). That problem is analogous to the non-existent "package manager" on Microsoft Windows. Each Windows installer is an arbitrary program that might do anything, and whose actions cannot be reasoned about by software tools.

    Furthermore, having one makefile in every directory is an almost assurewd way to produce bad makefiles.

    which apply equally to recursive and monolithic ones.

    Wrong. There is an inescapable difference in the performance (both speed and correctness). Recursive simply cannot compare with monolithic.

    Note that "monolithic" doesn't necessarily mean the makefile is stored in only one file on disk. A collection of files assembled via include directives is equivalent to monolithic, but somewhat easier for revision control. Non-"make" build control processes, such as Ant or those provided with some IDEs, also share the advantages of monolithic makefiles.

    The software industry has already demonstrated its support for RMCH, because all new "yet-another-better-than-make" projects take its ideas as unavoidable preconditions.

  88. You have terrible English skills by Anonymous Coward · · Score: 0

    It's amazing what American universities are failing to teach these days...

  89. ... room in my cube for fifteen computers ... by mnemotronic · · Score: 1
    I only had room in my cube for fifteen computers.

    I wonder how much noise and heat is generated by 15 PCs running in a small cubeacular office environment....

    --
    The Russians have won. They have made the world a cesspool of distrust, greed, fear and hate.