Genetic Algorithms and Compiler Optimizations
mfago writes "Scott Robert Ladd has written an enlightening article and accompanying software package that utilizes genetic algorithms to optimize the optimization flags fed to a compiler. Those who have tried to squeeze the last drop of performance
from a code know that it can be very difficult to determine the set of optimization flags beyond -O3 (with gcc for example) that yields the best performance. Lots of background links included in the article."
Toss around terms like genetic algorithm and neural networks, and some are dazzled by the elegance and simplicity of it. Well, then there's reality, which is often neither elegant nor simple. In Go, there is a computer player that utilizes neural networks, yet it is near the middle of the pack.
A good dose of skepticism should accompany any examination of the dazzingly elegant solution.
-Libertarian secular transhumanist
That is, fiddle with settings at random until it's acceptable?
It turns out that hand tuning will ALWAYS get you the best optimizations, but human nature dictates that a certain portion of optimizations will never be found. Genetic algorithms will find some of these optimizations in a reasonable time, but not all. So the bottom line is, no code can ever be 100% optimized, unless it is compiled for weeks.
This is a very neat application of genetic algoriths! Haven't seen something so interesting since this Tron cycle game.
It actually seems like a good idea.Some problems I can imagine are optimization flags may have non speed related side-effects?
:) or maybe the base-line is already close to optimal, and so it isnt a hard problem?
However, this seems like a great candidate for GA's - a fitness function (ie speed of execution), and a nice binary vector of atributes (flags).
Interesting that the gains aren't that great it would seem to sugges that the flags don't do much
But Kudos nevertheless. No expert in the GA field, so its possible someones tackled this before, but if they haven't, extra Kudos.
Winton
An important thing to note is that this isn't coming up with new optimizations, but rather finding the optimal mix of pre-existing gcc optimizations.
It's an important distinction. When you're trying to do breed some really new system using Genetic Algorithms, it's possible to get some very fragile results, i.e. systems that do the job incredibly efficiently, but only under certain conditions...
For instance, I remember reading some folks were trying to "breed" a circuit using Genetic Algorithm like techniques that created a "pulser" using as few gates as possible. They got something that used like one tenth the number of logic gates of the previous low record...but didn't work when they moved the setup away from their desktop PCs. Turned out what they "discovered" was a sensitive RF receiver that picked up (I think) something like the normal 120V cycling, and utilized. Similarly, a lot of the bred ciruits are extremely sensitive to temperature and other conditions.
So breeding for efficiency can be a bit dangerous, you get things that rely on odd side-effects and environmental states. Though I think the method the article describes isn't much more dangerous than normal optimization selection, though admittedly more likely to stumble over some unusual flag combination that might break under certain circumstances.
In short, *if* GA is ever going to let software development become more like gardening than engineering, (something I've herd as a goal) we're going to have to find ways to apply very general evolutionary pressures, complex test environments.
SO YOU'RE GOING TO DIE: The Comic for Dealing with Death
There is more information on his webpage. It's not strictly true -- compilers have moving targets (different architectures and hardware optimisations over time, greater complexity in languages, etc etc), and for important things like scientific applications compilers can really optimise code; but in general R&D towards compiler development seems to be sorely lacking compared to other areas.
I work on IA64, which is fundamentally designed to take advantage of smarter compilers. While there are some interesting projects (ORC for example) nothing is really close to ready to take over gcc/intel compiler. We really want something that compiles once, can profile code execution and then recompile to optimise based on the execution profile; something along the lines of what this article talks about but built right into the compiler.
As such, is there any justification for using a GA to do this rather than, say, simulated annealing? I'd would be interested to see a comparison of the two approaches. Especially in light of papers like this one.
Any time you consider using a broad-front search algorithm like a GA, a neural net, or simulated annealing, try plain-hill climbing first. If you try any broad-front search, compare it with plain-hill climbing. Only if the space being searched is dominated by local maxima (but not too many of them) do the broad-front algorithms help. And they're always slower.
If this guy had demonstrated that a GA did better than a plain hill-climber, he'd have an interesting result. But he hasn't demonstrated that.
Having worked on applying GAs to multi-objective optimization, I don't belive that this technique can be used effectively to optimize most programs.
The main issue is to compare the individuals generated by the genetic algorithm. To do so, we need to both compile the program under the specified settings, and then to be able to benchmark its performance. In my current job, a full build of our main product takes over 12 hours on an 8 CPU machine. Using a pretty conservative estimate, 100 generations with 100 individuals each, we'd be talking about more than a year of CPU time for a single run of the algorithm!
Even if we could ignore the amount of time required for compilation, we still have a second, more important flaw: Most programs out there are not really that easy to benchmark automatically. Database applications might need to go back to a known DB state to be able to run the benchmark faily. Also, server apps need to have the exact same load for every test if we want to be able to compare two executables fairly. This problem is increased when many compiler options will just create a 1% performance improvement or so. A poorly run system could lead the comparison function to just pick the wrong executable if the two executables didn't run in the exact same conditions.I see how using GE for this task has a high coolness factor, and how it might even be usable for applications that are by nature easier to benchmark, but don't expect this technique to be applicable to enterprise-sized server applications, or even most GUI based apps any time soon.
...because they were able to pick out combinations of optimizations that measuring each optimization individually wouldn't have discovered.
Consider the case in huffbench when
-fmove-all-movables, -freduce-all-givs and -ftracer were discovered to result in great improvements when used with the other GA-selected optimizations -- but not when applied directly to -O3.
Sure, these situations may be specific to the test cases -- but that's not to say his software can't be used to evaluate real-world scenarios as well, such that companies doing a final release compile of their code let a GA churn for a few days to determine the best flags to use.
That's a Good Thing, no?
That's exactly the point. The goal is not to find a set of gcc switches that will work best on any program on any supported platform. The goal is to find the set that works best with this particular input, without human guidance.
10% or 20% improvements in compiler performance are of largely academic interest only these days most software's time is spent waiting on user input, disk IO, or network activity anyway?
First of all, what if you can get the 10% or 20% basically for free? That is, simply run your compile for a few unattended days to get the optimization? What is not worth human tweaking time can easily be worth machine time. Gentoo, although I do not use it myself, appears to be a vibrant Linux distribution based exactly on this point.
Secondly, most existing software are running on small CPUs all around you, not on your desktop. The constraints of memory usage and CPU power are still very real for them.
Finally, consider also that more efficient software may result in more idle hardware and lower power consumption. Imagine if all PCs in the world can get by with 5% less electrical power.
Optimizers change the constants in the algorithm running time. They cannot change the algorithm's order.
With the exception of high-performance computing, no one needs a super-optimizing compiler any more. Instead, they need to (re-)learn basic algorithm and data structure theory.
This gets me thinking about Nat Friedman's GNU-rope (Grope) project. I heard him talk at ALS back in '98 and then the project seemed to completely disappear. Searching on "gnu.rope" leads to a few mailing list postings asking "where'd it go", but no good information about the project.
The basic idea was to reorder the functions in an executable so that locality of reference was maximized and cache hits were increased. The result is less paging and better performance and memory usage.
The really interesting bit is that the optimization is based on the usage of the program being optimized-- that is to say that my Grope-optimized version of Mozilla might be different than yours based on my differences in usage (i.e. perhaps I browse with images turned off, etc).
The tie-in to the article here is that Nat's system, Grope, used simulated annealing to traverse the n-dimensional space of potential function arrangements and profiled the memory paging of the application as a fitness function of the new arrangement. It's not a GA-- but it's functionally similar.
So-- anybody know what happened to Grope? I'd imagine that a community would spring up around it fairly quickly given the relatively high number of "performance zealots" who are busy "tricking out" their Linux boxes by compiling everything from scratch (think Gentoo) optimized for their procesor. Now they can add the "spolier" or "racing stripe" of having exectuables specifically re-ordered for their personal usage patterns! *smirk*
The Attitude Adjuster, I hate me, you can too.
More importantly, tweaking code for heavy optimization is not usually a good job for humans. It's fine if you've got a piece of hardware that you have to tweak the last few percent performance out of for an application that will run for many CPU-years, but you're almost always much better off if you
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
The "folks" you refer to is Professor Adrian Thompson of the University of Sussex. A paper describing his interesting experiment can be found here. It was actually a flawed FPGA chip he was programming.
Another example of this tendency of Genetic Algorithms to make use of helpful "flaws" in their environments can be found in the works of Karl Sims. A round-off error in his physics model resulted in some weird locomotion by a branch of virtual creatures.
You will find details of both examples in this entry on my Wetware blog.
While I don't disagree with your observations, I'm confused as to what they have to do with Ladd's project and the results shown.
Okay, find, I've written my decent quicksort. I know my basic algorithms and data structures. Now I would very much like to squeeze every last drop out of them. Here's a project that will tell me a good combination of options to use as a starting point. Don't tell me that I should just ignore all that and go re-re-re-learn basic algorithms. GCC is used by more than students.
More than that, it will tell the GCC maintainers which options should be on by default which currently might not be, and vice versa.
You cannot apply a technological solution to a sociological problem. (Edwards' Law)