Genetic Algorithms and Compiler Optimizations
mfago writes "Scott Robert Ladd has written an enlightening article and accompanying software package that utilizes genetic algorithms to optimize the optimization flags fed to a compiler. Those who have tried to squeeze the last drop of performance
from a code know that it can be very difficult to determine the set of optimization flags beyond -O3 (with gcc for example) that yields the best performance. Lots of background links included in the article."
Toss around terms like genetic algorithm and neural networks, and some are dazzled by the elegance and simplicity of it. Well, then there's reality, which is often neither elegant nor simple. In Go, there is a computer player that utilizes neural networks, yet it is near the middle of the pack.
A good dose of skepticism should accompany any examination of the dazzingly elegant solution.
-Libertarian secular transhumanist
That is, fiddle with settings at random until it's acceptable?
It turns out that hand tuning will ALWAYS get you the best optimizations, but human nature dictates that a certain portion of optimizations will never be found. Genetic algorithms will find some of these optimizations in a reasonable time, but not all. So the bottom line is, no code can ever be 100% optimized, unless it is compiled for weeks.
I met SRL when I was teenager visiting Gunnison Colorado for the Rocky Mountain SOG. That was over 15 years ago. He was well into GA's then. Good to see he's still at it!
This is a very neat application of genetic algoriths! Haven't seen something so interesting since this Tron cycle game.
Why now just write faster code, and use inline assembler when needed, possibly with SIMD and MMX instructions?
Those compilers are helpful, but as stated in article, squeeze the last drop of performance from a code is what they do.
Note to self: get smarter troll to guard door.
optimization flags beyond -O3 (with gcc for example) that yields the best performance.
Should actually read - optimization flags beyond -O3 (with gcc at least) that yields the best performance.
It actually seems like a good idea.Some problems I can imagine are optimization flags may have non speed related side-effects?
:) or maybe the base-line is already close to optimal, and so it isnt a hard problem?
However, this seems like a great candidate for GA's - a fitness function (ie speed of execution), and a nice binary vector of atributes (flags).
Interesting that the gains aren't that great it would seem to sugges that the flags don't do much
But Kudos nevertheless. No expert in the GA field, so its possible someones tackled this before, but if they haven't, extra Kudos.
Winton
An important thing to note is that this isn't coming up with new optimizations, but rather finding the optimal mix of pre-existing gcc optimizations.
It's an important distinction. When you're trying to do breed some really new system using Genetic Algorithms, it's possible to get some very fragile results, i.e. systems that do the job incredibly efficiently, but only under certain conditions...
For instance, I remember reading some folks were trying to "breed" a circuit using Genetic Algorithm like techniques that created a "pulser" using as few gates as possible. They got something that used like one tenth the number of logic gates of the previous low record...but didn't work when they moved the setup away from their desktop PCs. Turned out what they "discovered" was a sensitive RF receiver that picked up (I think) something like the normal 120V cycling, and utilized. Similarly, a lot of the bred ciruits are extremely sensitive to temperature and other conditions.
So breeding for efficiency can be a bit dangerous, you get things that rely on odd side-effects and environmental states. Though I think the method the article describes isn't much more dangerous than normal optimization selection, though admittedly more likely to stumble over some unusual flag combination that might break under certain circumstances.
In short, *if* GA is ever going to let software development become more like gardening than engineering, (something I've herd as a goal) we're going to have to find ways to apply very general evolutionary pressures, complex test environments.
SO YOU'RE GOING TO DIE: The Comic for Dealing with Death
Gentoomen, start your compilers.
Man, I say we need this 'cause my "Hello World" CRAWLS and the teacher needs it by tomorrow.
I'm not French, you insensitive Claude-hopper!
" Yet, despite that complexity, we need answers to important questions about modern, large-scale software. What sort of important questions? Well, consider the GNU Compiler Collection. I write articles that benchmark code generation, a task fraught with difficulties due to the myriad options provided by different compilers. For my benchmarks to have any meaning, I need to know which combination of options produces the fastest code for a given application."
I thought we already knew, the answer is "42", of course we would have to build a computer the size of earth to test it...
Sadly, these genetic algorithms only work for blue-eyed developers
Thank you. I enjoyed that intercourse immensely.
you poor bastard.
01100111 01100101 01110100 00100000 01101111 01110101 01110100 00100000 01101101 01101111 01110010 01100101 00101110
I would have liked to have seen genetic algorithms that optimize the generated object code, not the compiler flags. I never really thought those needed optimizing.
"And this is my boy, Sherman. Speak, Sherman." "Hello." "Good boy."
The concept is good in that it can take a predefined set of available optimizations and "figure out" (via some AI magic) optimal settings. This is good.
But, a more interesting problem might be more of a meta-compiler problem. Sure, we can optimize a set of options that we humans have determined/guessed are most influential in the building of efficient code. But what about running an analysis on the options themselves?
If we want to get real optimization, why not apply some data-mining theory to the different switches available within the compiler to generate the set of option switches selection?
Then, post selection of switches that are most influential in the compilation process, you can run this genetic algorithm to select "best switches" for the program you're compiling.
Diplomacy is the art of saying, "Nice doggie!" until you can find a rock.
I was lucky and the fucker never did come here. If he had I would have taken the day off of work and tried my best to throw a pie at his face. Failing that I would have just stared at him and called him names.
Whenever I see the word "Kudos" in a /. post, I instantly think the poster is a karma whore. It sounds like "wdavies" has something interesting to say, but he's damaging his credibility by using Kudos.
There are a couple problems you can come into when getting this deep into compiler optimization flags. 1. Flags will optimize certain parts of code well,but may make other parts slower. 2. There is no guarantee the optimization carries over to other computers.
The use of GAs here seems like more of a gimmick, the same thing could have been achieved using conventional benchmarking.
Lastly, 10% or 20% improvements in compiler performance are of largely academic interest only these days most software's time is spent waiting on user input, disk IO, or network activity anyway?
... now it's not that the compiler is too slow on the computer, but the super-optimiser that runs the compiler 2000 times on the code to get the speed as fast as possible, that runs too slow on the computer...
:-)
That's all right then
Simon
Physicists get Hadrons!
I would really have enjoyed watching you get shot to death on CNN. Perhaps next time, you could make an effort, eh?
There is more information on his webpage. It's not strictly true -- compilers have moving targets (different architectures and hardware optimisations over time, greater complexity in languages, etc etc), and for important things like scientific applications compilers can really optimise code; but in general R&D towards compiler development seems to be sorely lacking compared to other areas.
I work on IA64, which is fundamentally designed to take advantage of smarter compilers. While there are some interesting projects (ORC for example) nothing is really close to ready to take over gcc/intel compiler. We really want something that compiles once, can profile code execution and then recompile to optimise based on the execution profile; something along the lines of what this article talks about but built right into the compiler.
No, really?
As such, is there any justification for using a GA to do this rather than, say, simulated annealing? I'd would be interested to see a comparison of the two approaches. Especially in light of papers like this one.
Any time you consider using a broad-front search algorithm like a GA, a neural net, or simulated annealing, try plain-hill climbing first. If you try any broad-front search, compare it with plain-hill climbing. Only if the space being searched is dominated by local maxima (but not too many of them) do the broad-front algorithms help. And they're always slower.
If this guy had demonstrated that a GA did better than a plain hill-climber, he'd have an interesting result. But he hasn't demonstrated that.
Wouldn't it be easier to just ask Stallman what each one "optimizes" and which ones are the best to use? It would probably be a lot quicker... uh nevermind.
I have a bone to pick with the summary. -O3 probably won't help your performance in the first place, and will likely degrade it. -O2 or -Os, with maybe a few -mxxxx or -fxxxx flags, would be better. Of course, optimizing your code by making it not do stupid things is the best. (Though compiling with -O2 is pretty much standard and a good thing.)
+3 Interesting? The post is way off topic.
Genetic algorithms are completely heuristic. It sounds cool that the idea of evolution is being used for optimization, but there is no real logic to it. I would like to see evidence that genetic algorithms produce any better results than other methods on a reasonably large class of problems.
Call me grumpy, but I say they are a fad.
At last! Someone who starts their graphs from zero!
Having worked on applying GAs to multi-objective optimization, I don't belive that this technique can be used effectively to optimize most programs.
The main issue is to compare the individuals generated by the genetic algorithm. To do so, we need to both compile the program under the specified settings, and then to be able to benchmark its performance. In my current job, a full build of our main product takes over 12 hours on an 8 CPU machine. Using a pretty conservative estimate, 100 generations with 100 individuals each, we'd be talking about more than a year of CPU time for a single run of the algorithm!
Even if we could ignore the amount of time required for compilation, we still have a second, more important flaw: Most programs out there are not really that easy to benchmark automatically. Database applications might need to go back to a known DB state to be able to run the benchmark faily. Also, server apps need to have the exact same load for every test if we want to be able to compare two executables fairly. This problem is increased when many compiler options will just create a 1% performance improvement or so. A poorly run system could lead the comparison function to just pick the wrong executable if the two executables didn't run in the exact same conditions.I see how using GE for this task has a high coolness factor, and how it might even be usable for applications that are by nature easier to benchmark, but don't expect this technique to be applicable to enterprise-sized server applications, or even most GUI based apps any time soon.
I've always found the largest problems with optimizers to be that they sometimes generate incorrect code. He doesn't seem to test for this. Of course, maybe the benchmarks are smaller/less kludgy than the n x 10,000 line codes I've tried to optimize with flags.
Just because you don't understand where the term comes from doesn't means it's wrong.
Tom
Someday, I'll have a real sig.
It's because you don't know how to use a compiler, dipshit.
FUCK YOU YOU VERMIN!
take you tratorous verminous garbage OUTTA HERE you SON OF A BITCH
Of course, buggy code is more likely actually to fail under aggressive optimization. I've certainly had to maintain lots of buggy code, although of course I never write any of it myself. (Did you know that
yields undefined results? The compiler is allowed (and under -O3, encouraged) to generate code that will erase your disk and impregnate your sister.)I wonder if Mr. Ladd checked the results of the programs he ran.
If there are new entries to the top500 supercomputers list at every update, it is because there are still a HUGE amount of highly computational-intensive tasks to achieve
-------------------------------------------------
Programming is good for health
Interestingly, I read a story in Debian Weekly News [1] where someone noticed that sometimes compiling a C++ program with -O3 is worse than with -O2.
:o)
The moral of the story? Just because you're running Gentoo and you've compiled everything yourself doesn't mean you have the fastest possible system. You actually have to know what you're doing, too.
[1] http://www.debian.org/News/weekly/2003/44/
http://www.talknerdy.org
...because they were able to pick out combinations of optimizations that measuring each optimization individually wouldn't have discovered.
Consider the case in huffbench when
-fmove-all-movables, -freduce-all-givs and -ftracer were discovered to result in great improvements when used with the other GA-selected optimizations -- but not when applied directly to -O3.
Sure, these situations may be specific to the test cases -- but that's not to say his software can't be used to evaluate real-world scenarios as well, such that companies doing a final release compile of their code let a GA churn for a few days to determine the best flags to use.
That's a Good Thing, no?
I want to know why my tiny Hello World shows up at 45KB
try visual c++, it'll be 45Mb and won't even work unless you reboot.
The compiler ought to optimize this stuff anyway! How about "set suck=(yes|no)" and "prefer=(speed|size)" as options. Let the compiler and its designers come up with what is required to meet (or approach) a programmer's wishes.
I want to delete my account but Slashdot doesn't allow it.
The Green Hills C compiler had all kinds of optimization techniques but they also had a way to measure the assembler output. One method would eliminate all symbols and make huge code that ran inline. The code was fast but used up a lot of disk space. They had many optimization options and dozens of flags that could be set so that the compiler would actually give you what you wanted. Analyzing the assembly code was a great way to learn the ins and outs of their compiler.
Banjo - The more I know about Windoze, the more I love *nix
Want it small?
Beyond -O3 we have -OS and -O2. Both generates better codes on workstations. And -OS might be the best on Celerons and VIA C3 CPUs.
How will the synthetic algorithm determine if the app is for a server or a workstation ? And how to handle dual purpose machines ?
Optimizers change the constants in the algorithm running time. They cannot change the algorithm's order.
With the exception of high-performance computing, no one needs a super-optimizing compiler any more. Instead, they need to (re-)learn basic algorithm and data structure theory.
I thought on Slashdot we're supposed to wait for someone to mention Chess before we bring up Go?
Ah yes, but with the Kasparov reversal, you can jump from "Go" straight to "Mornington Crescent"
"She's furniture with a pulse"
This gets me thinking about Nat Friedman's GNU-rope (Grope) project. I heard him talk at ALS back in '98 and then the project seemed to completely disappear. Searching on "gnu.rope" leads to a few mailing list postings asking "where'd it go", but no good information about the project.
The basic idea was to reorder the functions in an executable so that locality of reference was maximized and cache hits were increased. The result is less paging and better performance and memory usage.
The really interesting bit is that the optimization is based on the usage of the program being optimized-- that is to say that my Grope-optimized version of Mozilla might be different than yours based on my differences in usage (i.e. perhaps I browse with images turned off, etc).
The tie-in to the article here is that Nat's system, Grope, used simulated annealing to traverse the n-dimensional space of potential function arrangements and profiled the memory paging of the application as a fitness function of the new arrangement. It's not a GA-- but it's functionally similar.
So-- anybody know what happened to Grope? I'd imagine that a community would spring up around it fairly quickly given the relatively high number of "performance zealots" who are busy "tricking out" their Linux boxes by compiling everything from scratch (think Gentoo) optimized for their procesor. Now they can add the "spolier" or "racing stripe" of having exectuables specifically re-ordered for their personal usage patterns! *smirk*
The Attitude Adjuster, I hate me, you can too.
More importantly, tweaking code for heavy optimization is not usually a good job for humans. It's fine if you've got a piece of hardware that you have to tweak the last few percent performance out of for an application that will run for many CPU-years, but you're almost always much better off if you
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
This is a new low. Next we'll see papers
on using genetic algorithms to find the best
channel on TV.
What would be interesting is a genetic system
to select meso-scale optimizations in a
compile back-end, for example.
-I like my women like I like my tea: green-
as make up the core of much custom scientific computing packages (or even libraries like the "Gnu Scientific Library")
Basically the jist is atlas "empirically" (read: use an optimizer for instance like GA, though empirical may actually mean brute force in this case) to optimize various parameters that will affect things like optimize the routine for the cache size of the processor etc. The cool thing about this, is they can get w/in 10% of hand machine coded BLAS/LAPACK libraries w/out the pain!
The "folks" you refer to is Professor Adrian Thompson of the University of Sussex. A paper describing his interesting experiment can be found here. It was actually a flawed FPGA chip he was programming.
Another example of this tendency of Genetic Algorithms to make use of helpful "flaws" in their environments can be found in the works of Karl Sims. A round-off error in his physics model resulted in some weird locomotion by a branch of virtual creatures.
You will find details of both examples in this entry on my Wetware blog.
Why not skip the recompile and perform profiling based optimization directly on the machine code?
For the five benchmarks he used,
-O3 was at least a little faster
than -O2 in every one of them!
You pusses and your classic computing algorithms. A quantum algorithm for optimizing compiler optimizations is the only way to go in Theta(1) time!
Seriously, this genetic algorithm approach is like most other approaches using GAs: stupid.
How about Roomba, I supect Roomba program is from GP (not GA) or some other evolution programming algo.
and yes, that really does consist of fiddling with the settings (each generation born counts as a twist to one of the control dials) until it's acceptable (Deer 2.0 can outrace Wolf 1.9, now Wolf needs to breed a 2.0 before they all starve).
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
While I don't disagree with your observations, I'm confused as to what they have to do with Ladd's project and the results shown.
Okay, find, I've written my decent quicksort. I know my basic algorithms and data structures. Now I would very much like to squeeze every last drop out of them. Here's a project that will tell me a good combination of options to use as a starting point. Don't tell me that I should just ignore all that and go re-re-re-learn basic algorithms. GCC is used by more than students.
More than that, it will tell the GCC maintainers which options should be on by default which currently might not be, and vice versa.
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
The first rule of evolutionary techniques is: if there is a better-known, more specific optimization technique for the problem at hand, use it instead. In this case, there wasn't one.
Simulated annealing has a lot in common with GAs, but they tend to not do so well on multidimensional discontinuous fitness landscapes.
I'm sure annealing could eventually be used to come up with a solution; I'm just amazed whenever somebody does something cool with Tool X, and all of slashdot immediately jumps all over it with, "why didn't they use [my favorite tool]? why this one?"
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
Good, because that's not the original intent of the project. When he first contacted the GCC maintainers about the project which would later be called Acovea, the idea was to see which underlying -f/-m options shod be turned on by the -O? options by default. More and more projects using 3.2 and 3.3 were using "-Osomething -fno-something -msomething" so he took the 3.4 (in development) code to see what should additionally be turned on and what should be turned off instead.
While you can use it to tune for particular programs, and particular patterns of use, it will mostly be useful to GCC itself.
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
The PostgreSQL optimizer has been based on GA for some time now.
As we all know, if your code is too slow then your compiler isn't smart enough.
...I would expect that of this was really something to write home about, it would have been published somewhere like PLDI or Usenix. Was it?
"I'm an old-fashioned type of guy. I worship the Sun and Moon as gods. And fear them."
In other words I randomly swapped pairs of instructions that could be swapped, if it saw an improvement it kept it and if it saw a degradation it rejected it with a probability depending on how bad it made things.
After hours of running it would sometimes trim a clock cycle or two per loop off a short assembler routine.
Doesn't it make you feel good to know that our freedoms are protected by politicans, lawyers and journalists.
There are domains where squeezing every last cycle out of the code matters. Many of us no longer work in those domains. Most of the code I've written in the last 4 years was developed and executed in an environment with essentially unlimited cycles, memory, and disk. The only exception was one project that got database bound, and compiler optimization won't help that.
For programmers who are no longer resource limited, using a language that optimizes for the human rather than the computer is a good idea. A language with garbage collection, low syntactic sugar, functional constructs like closures, and no separate compile phase make for happy code and happy programmers.
An interesting benefit of working in a humane language is that it can becomes easier to use a better algorithm. Many algorithms depend upon data structures that are difficult to implement in the land of DIY memory management, but a breeze in a more humane language. This can make a solution in the apparently unoptimal language outperform the solution in the optimal language. Of course, we *could* go faster using C or C++, but only if we have the extra time it takes to write the code.
Don't optimize for the computer unless the computer is the critical constraint. And, even then, only optimize the part that is the actual constraint.
Otherwise, optimize for the human. We're more important than the computer these days.
"Classic" genetic algorithms are little more than a pretty efficient search algorithms.
GAs are nothing more than search algorithms, but then pretty much all AI is search.
The complexity of any modern programming language is so high that using genetic algorithms to search for "a program in C that calculates fibonacci numbers" seems completely outlandish.
It may seem outlandish to you but using a form of GA known as Genetic Programming, programs to calculate Fibonacci numbers have already been evolved. It is certainly possible to generate C, assembly, and code from other languages in a meaningful way.
The cost in electricity to find such a program would be higher than just giving the same specifications to a programmer.
Remember, natural selection in the real world, which evolutionary algorithms are an abstraction of, have produced those programmers.
While I applaud to using GA to optimize the compiler settings, which basically tweaking handling of loops and function calls, how about figuring out which is most optimal way to store different variables, in registers, caches L1 and L2, RAM, HDD, tape? FYI, that is a known NP-hard problem.
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
Interesting but that sounds kind of scary. Once you attempt to find optimizations that actually try to find other equivalent algorithms, you start treading into the dangerous realm of undecidability and Turing completeness. Consider what optimizing Turing-complete systems like partial recursive functions or their equivalent Turing machines entails. The first, and most important thing you need to do is decide whether or not some Turing machine runs faster (or more generally has a better performance metric) than some other Turing machine. You run it, and do your measurements. But then, how do you know whether there might exist some input for which your TM's will eventually enter an infinite loop? You can't know that, it's undecidable. It's the halting problem, and I imagine you'll run into this all the time when you try to do this level of optimization.
Any system that tried to optimize recursive functions or any other equivalent Turing-complete formalism according to some optimization metric (e.g. space or time complexity) is at most attempting to apply heuristics to decide the undecidable. I seriously doubt that such an approach would ever be useful enough to clean up after sloppy "programmers" who never bothered to study proper algorithms and data structures, and recode a quicksort where a bubble sort was originally written. At best, I imagine it would produce marginal improvement while increasing compile time dramatically, and would be totally impractical for any sort of useful work. I seriously doubt such techniques were ever used in anything but research compilers for toy languages.
Qu'on me donne six lignes écrites de la main du plus honnête homme, j'y trouverai de quoi le faire pendre.
The halting problem says that it is impossible to write a program that will decide halting for every single possible program. The real problem is a certain class of pathological problems. There's no reason to believe that a specific program devised by a GA could not be proven to halt. It is also quite possible to prove that certain optimizaitons will preserve certain properties, such as halting. It is not possible to prove whether any given optimization will preserve halting.
These projects tend to be CPU bound and
every CPU cycle counts. I understand some
have been hand coded into assembler...but
not all.
It should be possible to apply the same technique to other performance-tuning things like buffer sizes, or even choosing between two different algorithms for the same task. Perhaps a program could have a single header file performance.h containing // buffer size for reading // amount of new memory to allocate in a chunk // threshold for switching from quicksort to // merge sort when you have this few elements //
const size_t read_bufsiz = 1000;
const size_t chunk_size = 5000;
const unsigned int merge_sort_thresh = 10;
and then these values would be tuned at the same time as compiler flags.
-- Ed Avis ed@membled.com
I'm with you 99%.
Within the next few days, I'll be adding an appendix and FAQ to my article, in light of all the interesting comments I've received here and in other forums.
Here's a big "Thank You" for the intelligent feedback I've received. And as for the doubters and naysayers: This is what peer review all about! If you think hill-climbing, or a completely random serach, or some other tool might work better than my creation, then, by all means, code and post it.
Competition and diversity are the fuel of evolution, after all.
All about me
i read this article before sumewhere few years
... hmmm ... anyway would have
... using a genetic
back... *scratch head*
really interessting! gcc is a compiler (never
used it) for c
been interessting to note the compile time TOO
for differen comipler flags used. maybe there's
correlation between how-long it takes for
compilation and the actual speed of the program.
he did mention something about compiled program
size which also should be taken into concideration.
back to the compiler
algorithem approach to sort out comipler flags.
dunno sounds like this guy is seriously standing
at the edge of artificial intelligence. i dunno
how this compiler flag optimization fits in to
AI stuff but i def. got this gut feeling while
reading the article.
gen. algorithem + compiler(flags) + ? = AI?
The AI folks and the Scalar Compiler folks at Rice are working on this project as well. Take a look, and scroll to the bottom for links to academic publications if you're interested.
Their initial objective was to reduce the power consumption of compiled programs, but notably the GA-discovered compiler flags increase the performance of most programs as well. Additionally they have experimented with a hill-climbing search for a particular program and have discovered that the search space is very bumpy. However the GA can beat the hillclimber by 20% performance in 200 generations (genepool=100, so search 20,000 trials) versus 2.8 million trials in the hillclimber.
I remember Andy McKay working with Keith on this at Rice in the summer of '98. Do you happen to know why the research has been going as slowly as it has? Are they just not very interested in pushing it, or are people just busy with cooler things, or what?
By the way, nice to see you around. What are you up to these days? Send me an email sometime. My alumni address will still work.
-jacob
A support vector machine is supposed to have no local-minima problem.