Should You Pre-Compile Binaries or Roll Your Own?
Jane Walker writes "The completion of pre-compiled packages and maximizing machine performance are two powerful incentives for Windows admins to use Linux and compile an OSS package." TechTarget has an article taking a look at some of the "why" behind rolling your own. What preferences have other Slashdot users developed, and why?
I feel that Gentoo Linux offers the best of both worlds, with their ebuilds. :-)
Was I the only one who found that this article didn't really shed too much light on whether or not you should compile your software from source?
By the way, I know the benefits of compiling from source, but how this made slashdot, I don't know.
What the hell's a "gewie?"
I have a feeling that this will turn into a general debian vs gentoo style debate, and we all have our preferences.
The big benefits of precompiling are that you don't need to support 1500 different sets of libraries in your development environments, and that the package will generally work right with minimum fuss.
The big benefits of source-based distros are the ability to tailor packages to each install (ie the ability to compile certain features in or out), to choose optimizations on each package (do you want -Os, -O2, -03, or are you really daring -> -ffast-math?).
There are some things that cut both ways - often a given package can be compiled using one or more different dependencies and if you want this flexibility then source-based might work better. On the other hand, it also means that if you have 500 different users of your distor you have 495 different configurations and bugs that are hard to reproduce.
As for me - I like source-based. However, if I had to build a linux email/browser box for a relative I'd probably use Debian stale...er...stable. The right tool for the right job.
Conversely, if programmers sufficiently document their binaries, that's not as much of a problem. URLs for other binaries, or instructions for checking/compiling dependencies, can speed up that process.
Of course, binaries are a huge advantage to non-experts and beginners, who just want a program to be there. They don't care about maximizing efficiency, they care about getting their work/fun done.
So really, it entirely depends on the application. For programs that are typically for the hardcore programmer/user crowd, source-only makes sense -- those people who use the program are going to know how to compile and check everything already. But for programs like, say, a CD burning program? I definitely think both should be provided, and installation for both should be well documented. Given how easy it is to provide a binary, even if it's "fat," there's no reason why a more popular program can't provide both versions.
Recompiling software gets you almost nothing. Maybe 10% more performance, at the very maximum.
There are special cases like when you want to use dynamic libraries instead of static (to save memory), or when there's a major architecture change (PPC -> x86 for Apple). In those cases you'll gain something.
Another case is rewriting your program to use CPU-specific instructions, like Altivec or SSE3. That, in certain circumstances, will speed up your program.
But if you're compiling OO.org or Mozilla because you think your 686 version will be 100% faster than the 386 version, you're wrong.
My other car is first.
If the time required to compile (plus trouble) is less than the time/trouble saved through performance gains (assuming you actually compile it correctly to get said gains), then compile. Else download and install.
But then again, if you just have a lot of time on your hands, it doesn't really matter what you do.
The difference between spam and poop is that you don't have to dig through septic tanks looking for real food. -- Me
In my field, ten percent is everything. If I could increase performance by ten percent, I'd get a 100% bonus for the year. My servers need to handle so much data and are always backlogged (and adding more on is exensive!) Don't trivialize ten percent.
Rank my idea: http://www.sinceslicedbread.com/node/531
It's true that gcc can generate P4-specific code, for instance, but in most cases and with generic compilation flags this will barely make a difference. I am personally not aware of a single mainstream linux app that does 5% better when optimized for P4 as opposed to generic 386 (I'm not including here apps that have hand-crafted assembly for a couple of instruction-set architectures, like, say, mplayer). I guess things might change slightly with gcc 4.* which has automatic vectorization, but in my experience automatically-vectorizable C code is very rare, unless written specifically that way.
The Raven
Up until this past year I've been a big fan of using a very stripped down Redhat (then later Fedora) distro and doing my own custom compilations of things like OpenSSL, Apache, OpenSSH, DHCPD, BIND, Courier, MySQL, PHP, XFree86 and X.org, and of course the Kernel itself. The main reason? It "just works". When I originally started using Linux and used RPMs I was very annoyed with them. I think RPM sucks. I also think BSD ports suck too. I don't like using stuff on my machine that I didn't massage first. That's just the way I am.
A co-worker introduced me to Gentoo late last year and I have to say I am very impressed. It's much faster than the optimizations I was using. Of course I didn't compile everything in RedHat or Fedora by hand. That's why Gentoo really rocks. You CAN hand compile everything from the ground up! I also used to use Linux From Scratch. And YES, I do use this stuff on production machines. You just can't get any better performance or security by doing it all yourself. The only reason to use package managers is if you are new to Linux or just don't want to learn much. But if you don't dig in, then you're at risk of thinking that something in your installation is broken, when it's not. I've seen many people throw up their hands saying, "I have to re-install my Linux box dammit!" when all they really needed to do was fix a symlink, edit a config file or downgrade something for compatibility reasons.
For example, on a laptop at home I decided I wanted to use the Xdmx server from X.org, so I hand compiled X.org. After that, I kept having this problem where the acpid (tracks the laptop's battery life among other things) daemon wasn't starting and would produce an error message whenever I logged into Gnome. I dug around on the net for quite a while and finally found out that the version of X.org (a devel version, not a stable version) grabs ACPI if you are using the RedHat graphical boot screen. The fix? Stop the RHGB from running by removing the RHGB kernel option. I think a lot of people would have assumed they hosed their installation and reinstalled if that problem really bothered them. It's not hard to find solutions to most problems in Linux no matter how obscure. That's why only knowing how to use pre-compiled binaries is a detriment if you're serious about using Linux.
-"...bad old ideas look confusingly fresh when they are packaged as technology" - Jaron Lanier (Digital Maoism on Edge.o
Do these guys even know what deplpoyment means?
They are not only wrong in their conclusion, but the article barely scratches the surface of the question.
Put simply, compiling software on your own is fine for a one off, or your desktop, or your hobby machine.ou.. or if you either a) need the latest wizbang features (and maybe can put up with the latest gobang bugs) or b) need really specific version control or c) can't find a precompiled package with the right ompile time options set.
Other than that, you should pretty much always use pre-built.
Sure, you can build your entire system from scratch if you like, libc on up. Why? The performance increase will be so minor that you will have to run benchmarks if you even want to be able to tell there was a change. You will then have to recompile everything you ever decide to update.
This strategy is a nightmare as the systems get more diverse and complex.
it also has nothing to do with deployment. Deployment is what you do after you have decided what you want and made it work once. Deployment is when you go and put it on some number of machines to do real work.
I would love to see them talk more about the differences in deployment. With precompiled packages from the os vendor, ala debian or redhat, its easy. You use apt-get or rpm and off you go. Maybe you even have a redhat network satalite or a local apt repository to give yourself more control. Then you can easily inject local packages or control the stream (no, sorry, I am NOT ready to upgrade to the new release)
but should you compile "most of the time"? hell no!
It is, in fact, the vast minority of software where you really need the latest features and or special compile options. Its the vast minority of the time where the performance increase will even be perceptable.
Why waste the time and cpu cycles? Takes over a day to get a full gentoo desktop going, and for what? I can have ubuntu installed and ready to go with a full desktop in maybe 2 hours.
Lets take the common scenario.... new openssl remote root exploit comes out. The email you read just basically said, in no uncertain terms, that half your network is now just waiting to be rooted by the next script kiddie who notices. Thats lets say... 50 machines.
Now your job is to deploy a new openssl to all these machines.
You could notice that the vulnerability came out in such a time frame that they allowed the OS vendors like debian to release fixes (often happens, if not they are usually out within a very reasonable time frame)... so you hit apt-get update && apt-get upgrade
or maybe you just promote the packgae into your repository, and let the automated updates deploy it for you. You go home, have a coffee, and be ready to log in remotly if anything goes tits up.
Now if you are hand compiling what do you do? Compile, test. And then um.... ahh... scp the dir to 49 machines and ssh in and do a make install on each?
How is this better than using a package manager again? Now you ahve dirs sitting around, you have to hope that your compile box is sufficiently similar to the other boxes that you didn't just add a library requirement (say because configure found that yoru dev box has libfuckmyshit.so installed and this neat new bits of openssl can make use of it)
How about when a box crashes and burns and you now need to take your lovingly handcrafted box, and rebuild it, and put all that lovingly compiles software back on it.
Fuck all that... give me a good binary distro any day of the week. I will hand compile when I HAVE to... and not before.
-Steve
"I opened my eyes, and everything went dark again"
If you're compiling your own for performance reasons, don't bother. There's a few packages that can benefit from being compiled for specific hardware, the kernel and libc, for example, and things like math libraries that can use special instructions when compiled for specific hardware. For the most part, though, your apps aren't going to be limited by the instruction set, they'll be limited by things like graphics and disk I/O performance and available memory. IMHO if you're trying to squeeze the last 1% of performance out of a system, you probably should look at whether you need to just throw more hardware horsepower at the problem.
The big benefits and drawbacks of custom-compiled vs. pre-built packages is in the dependencies. Pre-built packages don't require you to install development packages, compilers and all the cruft that goes along with a development environment. You can slap those packages in and go with a lot less installation, and you can be sure they're built with a reasonable selection of features. On the other hand, those pre-built packages come with dependencies. When you install a pre-built package you pretty much have to have the versions of all the libraries and other software it depends on. By contrast, when you compile your own packages they'll build against the versions of other software you already have, using all the same compiler options, and they'll probably auto-configure to not use stuff you don't have installed. This leads to a more consistent set of binaries, fewer situations where you need multiple versions of other packages installed (eg. having to install Python 2.3 for package X alongside Python 2.4 for package Y) and overall a cleaner package tree.
Where the cut-off is depends on your situation. If there's only a few instances of dependency cruft, it may not be an issue. If you have a lot of dueling dependencies, it may be worth while to recompile to reduce the number of different versions of stuff you need. If you've got externally-dictated constraints (eg. only one version of OpenSSL is approved for your systems and everything must use it or not use SSL at all) then you may have no choice but to compile your own if you can't find a pre-built package that was built against the appropriate versions of libraries.
Another missed point is it's usually easier for the developers and hardware vendors. Easier to distribute source code than maintain dozens of binaries. Just another advantage of open source. Many projects have a few binaries for the most popular platforms, and source code for those who want to use something less common.
Latest version and leading edge. I've been trying Gentoo. In most cases the extra performance isn't much, and isn't worth the hassle. And Gentoo defeats one of the attractions of roll your own: the leading edge. There is always a lag between when a project releases a new version, and a distributor can get around to turning the new version into a new package. Article didn't mention that aspect either. If your distro is slow, you can bypass them and go straight to the source, or, if available, the source's binaries. Ubuntu Linux 5.10, for instance, is still stuck on Firefox 1.0.7. I'm not talking about checking out the very latest CVS source tree, just the latest release.
Intellectual Property is a monopolistic, selfish, and defective concept. It is "tyranny over the mind of man"
However, I don't consider performance to be a particularly big benefit. Once you've got gcc doing it's basic -O3 or whatever, anything else is marginal at best.
There are, however, two things I get out of it which are useful to me. These should not be ignored:
Of course, there are drawbacks:
I guess it's just a case of "know the tool you're using".
...is that you get the benefit of the optimization that can only occur when compiling for a specific machine without actually needing to do the work yourself.
.NET and Java the IL/byte code is JIT'ed before execution and can have these machine-specific optimizations inserted.
:)
In platforms like
No need to break out gcc or try and resolve dependencies. Ahh... I loved the managed world.
I've used both Debian and Gentoo. I am now (mostly) using Gentoo and not Debian. I hope you might find my perspective helpful. (But it should also be stated I also use ports on FreeBSD, and I have come to the conclusion that source-based distros are easier for me to use.)
How I'd love to have so much dead CPU time! If your computer's not doing anything for you, why bother having one? Truth be told, one can reap performance gains in more definitive ways than trying to have your compiler make different binaries. As you indicated, running few processes helps. As can swapping in a custom kernel and/or using a faster filesystem (both of which you can do on Deb fairly easily).
I don't usually see a huge advantage, but it does depend on the app. For desktop users, app launchtime is often significant. I do think using '-Os' to make smaller binaries (which get into memory faster) does usually create a noticeable benefit. And for workstation/server apps, every few percent for "faster" apps could be helpful to some people (but I agree that it is typically only a few percent). But just leaving apps open is often "good enough" for load time & perhaps there aren't many who really need the extra few percent.
Yes. Absolutely. Particularly when you have relatively common needs across all apps. Perhaps you want to run an X-less server? Or perhaps you want to have apps with only KDE/QT or only Gnome/GTK+ or what not. USE comes to the rescue.
You are typically correct. But the thing is that foo more often than not will have optional support for some feature. But some gentoo ebuilds do, indeed, have USE flags that aren't just ./configure flags for some applications. For example, you can install xpdf with the 'nodrm' use flag, which applies a patch to cause xpdf to ignore drm restrictions. Indeed, for making custom ebuilds, USE flags prove to be quite useful: you can use them to test multiple patches without the need to apply a given patch to all installations & can easily check which features a certain app has (by checking which flags it was emerged with).
Gentoo's approach gives the user more choice. It preserves your old files by default. You can choose to replace the old config with a newer config or, more useful, merge (typically using sdiff) in changes between the old and new config. It doesn't choose what is best for you.
Debian's defaults are normally sane. But not always.
When I started using Linux I started with an old Slackware, some time later I started to compile everything myself and test different gcc options, and even patched some programs to use a devfs only system. About two years ago I decided that I have nor the time neither the will to keep doing it, I switched to debian unstable, as I was used to be always on the edge. Now I run debian stable in every machine, the server, the desktop and the notebook. I have compiled, adjusting by hand every option, some packages like wine and mencoder, but just because I use both to heavy filter and encode The Simpsons from DVB-T everyday and it take about 10 hours to complete the job, so every little percent is important. Everything else doesn't need such hard tuning.
Just compile speed-critical programs yourself and get everything else from a stable distribution.
I [may] disapprove of what you say, but I will defend to the death your right to say it.
Fine the subject doesn't make complete sense.. BUT... doesn't compiling code with Intel's cc result in significantly better binaries than any flag you can throw at gcc?? From http://www.intel.com/cd/ids/developer/asmo-na/eng/ 219902.htm, MySQL claims 20 percent performance improvement over gcc.
I'm not saying we all have access to icc, but if someone wants to make a binary available, I'm more liable to use that than compiling from source. Call me crazy. And I know someone will.
CommentBot 0.7a running with args "-module irritate,disagree -target random"
(Full disclosure, I've been using Debian since 1996. I'm backporting packages to stable as I write this, so I'm definitely not coming from a "one size fits all" perspective.)
If there is one thing that appeals to me about Gentoo (as I understand it), it's the concept of meta-configuration at build time. Unfortunately, lots of options in packages get configured at BUILD TIME, so either the binary packages have those options on, or they don't. When this is the case, if the distro doesn't provide multiple binary packages with all of the useful configurations, then you end up having to build from source. (IMO, building from source means compiling from source packages whenever possible.)
So I like the concept of saying in one place, "build for KDE", "build with ssl support", "build with ldap support", etc. Maybe someday everything will be runtime configurable, but until that day, I'll be wishing for that meta-level of configuration...
Having said all of that, check out apt-source, apt-build, and their ilk if you're interested in "Debian from source". Debian binary packages solve about 97% of my problems so I'm not usually looking for "make world" kind of functionality.
Enough rambling for now.
A vast number of regularly updated packages. Put simply, I can emerge almost anything, whereas every other distribution I've used, sooner or later I come across a package I need which doesn't have an RPM or what have you, and I have to build my own complete with the dependency hell that can entail.
More than that, almost half the time you can install an updated package not yet in Portage by renaming the most recent ebuild to match the new version number.
If that fails, you can invest a few hours and learn how to write your own ebuilds... It's insane how easy it really can be to get any bleating edge bit of software running.
What does this button d$#%* NO CARRIER
Boss: Jack, I want this new webserver machine to be up and running as soon as possible
Jack, the Gentoo Admin: Yes, sure chief! It will take just a couple of days to compile everything...
What do you think is gonna happen next?
In my experience, it is often necessary to recompile from source simply to have more than one version of the same package available at once! Too many pre-built binaries assume they are the only version in the universe you could want in /usr/local/bin.
/opt/pkgname/1.0), and at the same time they should assume their dependencies can also be found there. The /usr/local hierarchy would remain, but as a hierarchy of references to specific versions of things. The /usr/local hierarchy would contain selected default versions, it would be used for efficient runtime linking (have "ld" search one "lib", not 20 different packages), and it would be targeted for dependencies that truly don't care about the version that is used.
For some packages a recompile is merely annoying, having to download and reconfigure with a new prefix and rebuild; but for others, it can be a horrible web of configuration options to find numerous dependencies in special locations. This complexity can be really frustrating if all you want to do is relocate the tool so two different versions can be installed.
Pre-built binaries should assume by default that they'll go into a version-specific directory (say
There are other details, of course...for example, it may matter what compiler you use, you may want 32-bit and 64-bit, etc. But the basic principle is still simple: have a standard package version tree on all Unix-like systems so you can "just download" binaries without conflicts, once and for all.
"Microsoft killed my company, I hold a personal grudge. I don't use Microsoft products and neither should you."-JWZ
Precompiled binaries suite most situations. IMO, it's silly (and time consuming) to compile everything from source all the time. However, if you need a newer version of a program or some specific settings comiling from source is the way to go. Arch linux approaches this well. They offer optimized, often completely up-to-date, precompiled binaries in pacman, but allow you to build from source with the Arch Build System if neccissary.
Not only had I built every package from source (using ports), I also took the trouble to rebuild the base system and kernel with a custom configuration and options.
The benefits to some of this were obvious; the FreeBSD GENERIC kernel at the time seemed (to my eyes) to suffer a massive performance loss from its configuration. Anyone running FreeBSD *must* build at least a custom kernel, even if they use the binary distribution of everything else.
It was a lot of effort. What did I get out of it? It was by the end one of the speediest systems I had ever used since the days of DOS. Most programs loaded faster than their binary equivalents (on older machines the differences were more glaringly obvious, such as the time it took to initialize X).
One time I clocked my old machine, running a custom built FreeBSD installation, against the other computers in the house from power-on to a full desktop (after login).
On my machine, the entire affair (BIOS, bootloader, bootstrapping, system loading, X, login, desktop environment (WindowMaker in this case)) cost a mere 45 seconds. My father's machine, which was in all respects a faster computer, loaded Windows 2000 in the course of perhaps two minutes. Also, I stopped timing after the desktop came up, but Windows does continue to load and fidget about for a good while after that. The extra time taken for it to settle down would have cost it another minute, but only because of all the crap my dad had set to load, which I don't blame Windows for.
The kitchen computer also ran Windows 2000, but had a slimmer configuration, so it loaded shortly over a minute. FreeBSD, however, still beat them both badly.
In light of my own experience, compiling from source can get you some rather wonderful results. However, I noticed that not all systems were created equal. While FreeBSD GENERIC was as slow as molasses, I find in linux that the binary kernels that come with my distributions seem to load and operate just as fast, if not faster than my custom build of FreeBSD. In linux I have used only binary packages, and the system overall "feels" just as fast, though some operations are a little slower (like loading emacs ;)).
I appreciate the arguments presented by both camps, but I feel the need to point out that some are too quick to downplay the possible performance gains offered by custom builds, because they certainly exist. Sometimes they can be noticeably significant.