Slashdot Mirror


Genetic Algorithm Improves Shellsort

gstover writes "As a personal programming project recently, I used a genetic algorithm to find a better sequence of increments for Shellsort. It found at least one sequence that out-performs the best previously known by about 5 percent. Here is an article about it (also here). I believe this is the first time a genetic algorithm has actually improved Shellsort."

18 of 71 comments (clear)

  1. Send it to DKnuth by tetrode · · Score: 3, Funny

    You'll be included in THE book & be famous

    Mark

  2. Re:Nice and all by photon317 · · Score: 4, Interesting


    There may be some cases where shellsort is more desirable for the exact data being sorted, I don't really know for sure. The importance of this is that he has used a GA to better the optimization work of humans on shellsort. He has laid the groundwork and circumstantial proof out for others to do the same with other algorithms. Of course he evolved a set of constants more than an algorithm itself.

    The next logical place to go with this work, IMHO:

    1) Invent a concise fake machine language for sorting algorithms (a convenience to make the rest of this easier). It should have basic instructions used in manipulating and sorting arrays (move, compare, copy, branching, etc...).

    2) Write a "sort engine" that takes as input algorithms written in your fake language and uses them to sort things (outputting some performance numbers).

    3) Implement every known array sorting algorithm you can find in your little fake machine sort language.

    4) Let a GA evolve the whole algorithm by arbitrarily replacing bytes with other valid bytes from your specialized assembler language, starting with all the known sort algs as parents. Let it run until it stabilizes, using a relatively high mutation rate.

    Of course, the big problem is that if your language implements any kind of looping construct, or any other way that code can be re-executed (and it will almost have to), then you face a "halting problem" when testing each new child. The pratical workaround is of course to know that any reasonable algorithm must finish the sort in a certain bounded amount of cpu cycles, and terminate any children who take longer.

    5) Translate the winning candidate(s) custom machien source back into a generic language like C, and puzzle over exactly why it works so damn well.

    --
    11*43+456^2
  3. Re:Nice and all by Yokaze · · Score: 3, Informative

    Bottom-up mergesort does.
    Theta(n log(n)) instead of Theta(n^2) like the (standard) Quicksort.

    (Shellsort has Theta(n log(n)^2), IIRC)

    --
    "Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"
  4. From the article by one9nine · · Score: 4, Funny


    "Notice that the mating process has nothing to do with the constraints placed on valid organisms."

    I wish more women would read Slashdot. Chalk one up for the "Size Doesn't Matter" team.

    "I chose single-point crossover as the method of mating because it is simple & generic."

    I choose doggie style for the same reasons.

  5. Re:Nice and all by Da+VinMan · · Score: 4, Interesting

    GA is good for a lot of things. For instance, it was used to redesign diesel engines to be more efficient.

    The big problem with GA though, IIRC, is that the resulting solution is often incomprehensible to a human. I believe Bill Joy did some work with GAs and had comments along those lines (sorry, I couldn't find the quote). Consider for a moment though trying to troubleshoot code generated by a computer. Bad variable names would be just the start of your problems. The logic patterns employed would be essentially random to a human. Many of the patterns would be vestigial and wouldn't even be relevant, but you wouldn't even know that. Identifying the primary execution paths would be a huge chore... never mind actually understanding the basis for why the generated solution actually works.

    How comfortable would you be deploying a solution (hardware or software) where the fundamental design isn't even understood? How the heck do you fix such a thing once it's deployed?

    --
    Please mod this post only if you think others should/n't read this. I have enough ego^H^H^Hkarma. Thanks!
  6. Actually this HAS beed done before by stackdump · · Score: 4, Informative

    One of my professors at Midwestern State University (texas) has done this before. Although it seems you may have improved on his solution. (but I couldnt find a copy of his paper)

    "Faster Shellsort Sequences: A Genetic Algorithm Application", Proceedings of the International Society for Computers and Their Applications(ISCA), April 7-9, 1999. (With Shashidhar Yachavaram) Presented at the conference, Cancun Mexico.

  7. Re:Nice and all by Gumshoe · · Score: 5, Informative
    I guess it's a nice thing that shellsort is getting improved, and I do realize that this is great for genetic research, but, pardon me for being an uneducated ignorant bastard, does it have any practical value, seeing as quicksort is much faster than shellsort? Or am I wrong?


    You're generalising, which is an easy mistake. Quicksort is good but for many data sets, it is very likely suboptimal and another alogorithm would perform better. Shellsort is good for small data sets. In fact, a good quicksort implementation may use shellsort once the divisions get small enough.

    Another example:, quicksort is poor at sorting strings. This is because you have to perform a full strcmp() (or equivalent) at each comparison. For this reason, a modified radix sort is often used.
  8. Re:Impressive by lsdino · · Score: 3, Informative

    Speeding up shell sort is not very exciting and hardly useful. It's the slowest of the n^2 sorts [wku.edu]. I'm surprised no one has pointed this out. Where are the computer geeks?


    Quoting from your own link: "Invented by Donald Shell in 1959, the shell sort is the most efficient of the O(n2) class of sorting algorithms. Of course, the shell sort is also the most complex of the O(n2) algorithms."

    Which part of "shell sort is the most efficient of the O(n2) class of sorting algorithms" do you not understand?

    Of course, why speed up an O(n2) sort - why not try to speed up an O(n log n) sort like quicksort? try and figure out the best way to choose the pivot point!

  9. Re:Impressive by jovlinger · · Score: 4, Interesting

    Danny Hillis used his CM-1 to evolve sorting networks for known lengths. Using a predator-prey model (the predator won by generating sequences that the networks failed to sort), he evolved several "optimal" sorters.

    An obvious extension to generic lengths is to use these precomputed networks as recursion base cases for quicksort, instead of switching to selection sort for lengths x (x ~ 5 typically).

    IIRC shellsort works similarly: recursively sorting subsequences and merging results.

  10. Fitting... by wcbarksdale · · Score: 3, Funny

    A paper about genetic algorithms, written by a guy named Gene.

  11. Re:Nice and all by skaffen42 · · Score: 4, Funny

    How comfortable would you be deploying a solution (hardware or software) where the fundamental design isn't even understood? How the heck do you fix such a thing once it's deployed?

    I'm just happy my parents didn't have the same concerns when they "deployed" me! :)

    --
    People couldn't type. We realized: Death would eventually take care of this.
  12. efficency vs proof - Do we want this? by acomj · · Score: 3, Insightful

    Genetic algorithms are interesting in that they can become more efficient and modify themselves. I don't mean to sound like a ludite but is this a good way to develop algorithms?

    Is the efficiency gained worth the difficulty in proving the new genetical enhanced algorithm works?

    Its hard enough with vigorous testing to prove simple programs work without fail without introducing some random "Genetic" changes that may break the algorithm for some input cases.

    One of the problems with AI and self modifing algorithms is that as it learns /evolves you don't get the same output from the same input.. I guess thats the point, but I don't want my programs running like that.

  13. Re:Nice and all by platypus · · Score: 3, Interesting

    But you'd better make damn sure that your GA doesn't improve the algorithm specifically for your "virtual machine". I.e. if pointer arithmetics were very slow (pulling that example out of my ass, btw.), your GA might tend to penalize algorithms using them more often then others.

  14. Re:Nice and all by photon317 · · Score: 3, Interesting

    True true. Probably the answer would be instead measuring the real execution time in your engine, meaure it in number of various operations, and weight them by how expensive those operations are on a typical modern 32 bit processor.

    --
    11*43+456^2
  15. Shell Sort is proven, but good increments can't be by Sloppy · · Score: 3, Interesting
    I think you fundamentally misunderstand what is being done here. Shell Sort works, period. His project does not modify the algorithm in any way. There is no chance of him introducing an error.

    Shell Sort works by roughly sorting, then sorting a little finer, and so on, until you finely sort every piece of data, and hopefully at this final and potentially most time-consuming step, there isn't a lot of work to be done.

    The thing is, there's no platonic God-like perfect mathmatically-proven best choice of how fine or rough the early iterations should be. No matter what increments you use (within certain constraints) Shell Sort will still work every single time (the last step at increment 1 guarantees it) -- it's just a matter of performance. Nobody knows what increments work best, and you can't just mathmatically figure it out.

    ..Which is exactly what makes using a genetic algorithm such a good idea, for trying to come up with some values. GAs are great for optimization problems that defy analysis, as long as you can represent possible solutions as genotype strings and you can write a fitness function to evaluate them. That's what this guy did, and IMHO it is really cool.

    --
    As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
  16. Re:Nice and all by Kynde · · Score: 3, Insightful

    I guess it's a nice thing that shellsort is getting improved, and I do realize that this is great for genetic research, but, pardon me for being an uneducated ignorant bastard, does it have any practical value, seeing as quicksort is much faster than shellsort? Or am I wrong?

    Shell sort has better worst-case running-time than quick sort does.

    Most programming these days is all about the common case, but in real-time programming it's all about predictability. Thus there the mergesort is the standard choice over quicksort.

    --
    1 Earth is warming, 2 It's us, 3 it's royally bad, 4 we need to take action NOW
  17. Re:Nice and all by doubtless · · Score: 3, Interesting

    You might want to check out Genetic Programming instead. It is the method of evolving programs instead of evolving solutions like Genetic Algorithm described here.

    There are some inherit disadvantages of GP, while being much more powerful than GA. One of them being the problem of code growth, usually after some generations, the organisms gets to be too large and starting to cosume too much memory and cpu resources to be feasible. One of the papers I wrote while taking Evolutionary Computation class while in college was the investigation and a 'solution/improvement' to the code growth problem.

    --
    geek page at KY speaks
  18. Re:Nice and all by Anonymous+Brave+Guy · · Score: 3, Informative
    It's quicksort n log(n) and shellsort n^2?

    It depends on what you're measuring.

    While Quicksort is O(nlogn) in typical use, its worst case performance is O(n), although most serious implementations would use a variation such as Introsort to limit the worst case behaviour.

    Mergesort is O(nlogn) in all cases. However, the constant factor is usually significantly greater than Quicksort, making the latter preferable for most uses.

    Shell sort is significantly slower for most situations, though IIRC it's pretty good with data that's already almost sorted.

    --
    If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.