Genetic Algorithm Improves Shellsort

Send it to DKnuth by tetrode · 2002-12-09 05:43 · Score: 3, Funny

You'll be included in THE book & be famous

Mark

Impressive by trajano · 2002-12-09 06:25 · Score: 2, Interesting

Personally, I am quite impressed with the concept of Genetic Algorithms.

I could imagine your shell sort could be applied to database technologies by creating a separate low priority thread that would generate the "gene pool" to improve the indexing of the tables in the background.

On a side note I am learning about genetic algorithms with hopes that I may work on it when I take my Masters in Computer Science.

--
Archie - CIO-for-hire :-)

Re:Impressive by lsdino · 2002-12-09 08:05 · Score: 3, Informative

Speeding up shell sort is not very exciting and hardly useful. It's the slowest of the n^2 sorts [wku.edu]. I'm surprised no one has pointed this out. Where are the computer geeks?

Quoting from your own link: "Invented by Donald Shell in 1959, the shell sort is the most efficient of the O(n2) class of sorting algorithms. Of course, the shell sort is also the most complex of the O(n2) algorithms."

Which part of "shell sort is the most efficient of the O(n2) class of sorting algorithms" do you not understand?

Of course, why speed up an O(n2) sort - why not try to speed up an O(n log n) sort like quicksort? try and figure out the best way to choose the pivot point!
Re:Impressive by jovlinger · 2002-12-09 08:08 · Score: 4, Interesting

Danny Hillis used his CM-1 to evolve sorting networks for known lengths. Using a predator-prey model (the predator won by generating sequences that the networks failed to sort), he evolved several "optimal" sorters.

An obvious extension to generic lengths is to use these precomputed networks as recursion base cases for quicksort, instead of switching to selection sort for lengths x (x ~ 5 typically).

IIRC shellsort works similarly: recursively sorting subsequences and merging results.
Re:Impressive by pirap · 2002-12-10 03:09 · Score: 2, Informative

You should not look at genetic algorithms, but perhaps upon a newer "family" known as PSO (Particle Swarm Optimization). It's obviously very similar to GAs but takes care of cooperation as part of the development of solutions. This is the "weak" point of GAs as they only work through "battling" other solutions.

Swarms are a concept which was "invited" in 1995 I believe. Read some articles about it.

Historic moment! by 0x0d0a · 2002-12-09 06:32 · Score: 2

This may be the first time something potentially important hit Slashdot before being reported elsewhere! :-)

--
May we never see th

Re:Historic moment! by 0x0d0a · 2002-12-09 20:22 · Score: 2

Okay, a first for something potentially important *and* plausible.

--
May we never see th

Re:Nice and all by jalet · 2002-12-09 07:01 · Score: 2, Informative

quicksort doesn't preserve the order of duplicate keys, IIRC

--
Votez ecolo : Chiez dans l'urne !

Re:Nice and all by photon317 · 2002-12-09 07:08 · Score: 4, Interesting

There may be some cases where shellsort is more desirable for the exact data being sorted, I don't really know for sure. The importance of this is that he has used a GA to better the optimization work of humans on shellsort. He has laid the groundwork and circumstantial proof out for others to do the same with other algorithms. Of course he evolved a set of constants more than an algorithm itself.

The next logical place to go with this work, IMHO:

1) Invent a concise fake machine language for sorting algorithms (a convenience to make the rest of this easier). It should have basic instructions used in manipulating and sorting arrays (move, compare, copy, branching, etc...).

2) Write a "sort engine" that takes as input algorithms written in your fake language and uses them to sort things (outputting some performance numbers).

3) Implement every known array sorting algorithm you can find in your little fake machine sort language.

4) Let a GA evolve the whole algorithm by arbitrarily replacing bytes with other valid bytes from your specialized assembler language, starting with all the known sort algs as parents. Let it run until it stabilizes, using a relatively high mutation rate.

Of course, the big problem is that if your language implements any kind of looping construct, or any other way that code can be re-executed (and it will almost have to), then you face a "halting problem" when testing each new child. The pratical workaround is of course to know that any reasonable algorithm must finish the sort in a certain bounded amount of cpu cycles, and terminate any children who take longer.

5) Translate the winning candidate(s) custom machien source back into a generic language like C, and puzzle over exactly why it works so damn well.

--
11*43+456^2

Re:Nice and all by Pentagram · 2002-12-09 07:09 · Score: 2, Informative

Quicksort is slow if the data it is sorting are already almost sorted, a situation which (IIRC) shellshort is particularly efficient at dealing with.

Re:Nice and all by Yokaze · 2002-12-09 07:14 · Score: 3, Informative

Bottom-up mergesort does.
Theta(n log(n)) instead of Theta(n^2) like the (standard) Quicksort.

(Shellsort has Theta(n log(n)^2), IIRC)

--
"Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"

From the article by one9nine · 2002-12-09 07:18 · Score: 4, Funny

"Notice that the mating process has nothing to do with the constraints placed on valid organisms."

I wish more women would read Slashdot. Chalk one up for the "Size Doesn't Matter" team.

"I chose single-point crossover as the method of mating because it is simple & generic."

I choose doggie style for the same reasons.

Re:Nice and all by Da+VinMan · 2002-12-09 07:18 · Score: 4, Interesting

GA is good for a lot of things. For instance, it was used to redesign diesel engines to be more efficient.

The big problem with GA though, IIRC, is that the resulting solution is often incomprehensible to a human. I believe Bill Joy did some work with GAs and had comments along those lines (sorry, I couldn't find the quote). Consider for a moment though trying to troubleshoot code generated by a computer. Bad variable names would be just the start of your problems. The logic patterns employed would be essentially random to a human. Many of the patterns would be vestigial and wouldn't even be relevant, but you wouldn't even know that. Identifying the primary execution paths would be a huge chore... never mind actually understanding the basis for why the generated solution actually works.

How comfortable would you be deploying a solution (hardware or software) where the fundamental design isn't even understood? How the heck do you fix such a thing once it's deployed?

--
Please mod this post only if you think others should/n't read this. I have enough ego^H^H^Hkarma. Thanks!

Actually this HAS beed done before by stackdump · 2002-12-09 07:19 · Score: 4, Informative

One of my professors at Midwestern State University (texas) has done this before. Although it seems you may have improved on his solution. (but I couldnt find a copy of his paper)

"Faster Shellsort Sequences: A Genetic Algorithm Application", Proceedings of the International Society for Computers and Their Applications(ISCA), April 7-9, 1999. (With Shashidhar Yachavaram) Presented at the conference, Cancun Mexico.

Re:Actually this HAS beed done before by gstover · 2002-12-09 11:48 · Score: 2, Interesting

The only GA-originated increments for Shellsort that I could find online was the one in Crew fall 2001 report, but that sequence didn't perform very well, & they didn't document their technique except to say that it wasn't a full-blown GA. (They placed some limitations on it. They also mention that Simpson & Yachavaram used only a limited GA.)

I wondered if a GA that fully manipulated bit-strings could produce a sequence as good as Sedgewick's, & I also wanted to document the technique so someone else interested in GAs could implement their own or at least be a little entertained with how it was done.

Other sorting attempts using GA's by Tablizer · 2002-12-09 07:20 · Score: 2, Informative

In Melanie Mitchell's wonderful book, "An Introduction to Genetic Algorithms", page 21-27, The author describes using genetic algorithms to find optimal patterns for "sorting networks". They are used for sorting chips (hardware) to sort in chunks of say 16 or 64 units. The order of comparisons is important. Using genetic algorithms, they were able to come very close to the best sequences found by humans. (I don't know if hardware sorting is still used. I had the impression that it is a mainframe thing.)

Thus, optimizing sorting algorithms with GA is not new in itself.

--
Table-ized A.I.

Re:Nice and all by Gumshoe · 2002-12-09 07:36 · Score: 5, Informative

I guess it's a nice thing that shellsort is getting improved, and I do realize that this is great for genetic research, but, pardon me for being an uneducated ignorant bastard, does it have any practical value, seeing as quicksort is much faster than shellsort? Or am I wrong?

You're generalising, which is an easy mistake. Quicksort is good but for many data sets, it is very likely suboptimal and another alogorithm would perform better. Shellsort is good for small data sets. In fact, a good quicksort implementation may use shellsort once the divisions get small enough.

Another example:, quicksort is poor at sorting strings. This is because you have to perform a full strcmp() (or equivalent) at each comparison. For this reason, a modified radix sort is often used.

Quicksort vs Timsort by tdelaney · 2002-12-09 08:21 · Score: 2

Samplesort vs Timsort.

Intro
-----
"This describes an adaptive, stable, natural mergesort, modestly called
timsort (hey, I earned it ). It has supernatural performance on many
kinds of partially ordered arrays (less than lg(N!) comparisons needed, and
as few as N-1), yet as fast as Python's previous highly tuned samplesort
hybrid on random arrays."

Just like to add... by Hubert_Shrump · 2002-12-09 10:45 · Score: 2

From a GA fan, and seeing all the articles on distributed computing...

You may also be interested in Tierra. Open-ended networked evolution - it's pretty nifty.

Got to love watching the computer solve problems for you. Ahh...

--
Keep your packets off my GNU/Girlfriend!

Also see comb sort by flockofseagulls · 2002-12-09 11:14 · Score: 2, Interesting

The comb sort is a variation of the bubble sort algorithm that achieves N log(N) execution times, similar to heapsort. The algorithm is simple and memory efficient, compared to Quicksort.

http://cs.clackamas.cc.or.us/molatore/cs260Spr01 /c ombsort.htm

http://world.std.com/~jdveale/combsort.htm

Fitting... by wcbarksdale · 2002-12-09 11:20 · Score: 3, Funny

A paper about genetic algorithms, written by a guy named Gene.

Re:Nice and all by skaffen42 · 2002-12-09 14:51 · Score: 4, Funny

How comfortable would you be deploying a solution (hardware or software) where the fundamental design isn't even understood? How the heck do you fix such a thing once it's deployed?

I'm just happy my parents didn't have the same concerns when they "deployed" me! :)

--
People couldn't type. We realized: Death would eventually take care of this.

efficency vs proof - Do we want this? by acomj · 2002-12-09 15:27 · Score: 3, Insightful

Genetic algorithms are interesting in that they can become more efficient and modify themselves. I don't mean to sound like a ludite but is this a good way to develop algorithms?

Is the efficiency gained worth the difficulty in proving the new genetical enhanced algorithm works?

Its hard enough with vigorous testing to prove simple programs work without fail without introducing some random "Genetic" changes that may break the algorithm for some input cases.

One of the problems with AI and self modifing algorithms is that as it learns /evolves you don't get the same output from the same input.. I guess thats the point, but I don't want my programs running like that.

Re:efficency vs proof - Do we want this? by Omnifarious · 2002-12-10 03:42 · Score: 2

In this case, the algorithm is guaranteed to work with a wide range of parameters governing its precise functioning. The genetic algorithm is only being used to search the parameter space for the most efficient parameters. Since the algorithm will work no matter which parameters you pick, this is a perfectly fine way to use genetic algorithms.

Also, genetic algorithms aren't self-modifying.

--
Need a Python, C++, Unix, Linux develop

Re:Nice and all by platypus · 2002-12-10 00:23 · Score: 3, Interesting

But you'd better make damn sure that your GA doesn't improve the algorithm specifically for your "virtual machine". I.e. if pointer arithmetics were very slow (pulling that example out of my ass, btw.), your GA might tend to penalize algorithms using them more often then others.

Re:Nice and all by photon317 · 2002-12-10 03:28 · Score: 3, Interesting

True true. Probably the answer would be instead measuring the real execution time in your engine, meaure it in number of various operations, and weight them by how expensive those operations are on a typical modern 32 bit processor.

--
11*43+456^2

Shell Sort is proven, but good increments can't be by Sloppy · 2002-12-10 06:00 · Score: 3, Interesting

I think you fundamentally misunderstand what is being done here. Shell Sort works, period. His project does not modify the algorithm in any way. There is no chance of him introducing an error.

Shell Sort works by roughly sorting, then sorting a little finer, and so on, until you finely sort every piece of data, and hopefully at this final and potentially most time-consuming step, there isn't a lot of work to be done.

The thing is, there's no platonic God-like perfect mathmatically-proven best choice of how fine or rough the early iterations should be. No matter what increments you use (within certain constraints) Shell Sort will still work every single time (the last step at increment 1 guarantees it) -- it's just a matter of performance. Nobody knows what increments work best, and you can't just mathmatically figure it out.

..Which is exactly what makes using a genetic algorithm such a good idea, for trying to come up with some values. GAs are great for optimization problems that defy analysis, as long as you can represent possible solutions as genotype strings and you can write a fitness function to evaluate them. That's what this guy did, and IMHO it is really cool.

--
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.

Re:Shell Sort is proven, but good increments can't by acomj · 2002-12-10 08:54 · Score: 2

I was extrapolating to the "AI Future" of much more complex algorithms being evaluated (evolved) in more fundamental ways then just changing parameters.

My frame of mind is has changed. Right now is I'm working on very time constrained real time stuff, and when I have to do it in X seconds I like to know what the worse case is, not best average case.

The genetic algorithm sounds like and iterative test to try and find the optimum increments for shell sort. Really nothing revolutionary.

I couldn't recall from the test description, but it doesn't seem that different arrays to sort were used to evolve the parameters of the sort (increment list). If this is the case the solution is just a sort optimized for a particular dataset and of little use.

I would be interested to see how well the algorithm would work elvolved against different random data sets (almost in order, always reverse order, random).

And then compared with others Standard increments.

They could have a competion like they have to come up with good prisoners delima algorithms, but instead come up with the best shell sort increments...

Re:Nice and all by Kynde · 2002-12-11 00:15 · Score: 3, Insightful

I guess it's a nice thing that shellsort is getting improved, and I do realize that this is great for genetic research, but, pardon me for being an uneducated ignorant bastard, does it have any practical value, seeing as quicksort is much faster than shellsort? Or am I wrong?

Shell sort has better worst-case running-time than quick sort does.

Most programming these days is all about the common case, but in real-time programming it's all about predictability. Thus there the mergesort is the standard choice over quicksort.

--
1 Earth is warming, 2 It's us, 3 it's royally bad, 4 we need to take action NOW

Re:Nice and all by Kynde · 2002-12-11 00:25 · Score: 2

I guess it's a nice thing that shellsort is getting improved, and I do realize that this is great for genetic research, but, pardon me for being an uneducated ignorant bastard, does it have any practical value, seeing as quicksort is much faster than shellsort? Or am I wrong?

Quicksort is good for a lot of cases, but not for all. Yet one more good example (not mentioned here yet) would be the presense of knowledge or how unsorted things really are. Meaning that if you know something extra about the "unsorted" state of the data quicksort can be outperformed easily.

Classic example is to arrange bank payments based on the actual pruchase date using the credict card. As each store sends the bill with some variation the bills arrive in time unsorted. On the average they're still almost sorted, but not quite. For such a case quicksort would perform quite horribly in respect to what could be done.

True randomness is far more seldom encountered, thus other means for sorting can also be implemented more often that common people think.

--
1 Earth is warming, 2 It's us, 3 it's royally bad, 4 we need to take action NOW

Re:Nice and all by doubtless · 2002-12-11 02:31 · Score: 3, Interesting

You might want to check out Genetic Programming instead. It is the method of evolving programs instead of evolving solutions like Genetic Algorithm described here.

There are some inherit disadvantages of GP, while being much more powerful than GA. One of them being the problem of code growth, usually after some generations, the organisms gets to be too large and starting to cosume too much memory and cpu resources to be feasible. One of the papers I wrote while taking Evolutionary Computation class while in college was the investigation and a 'solution/improvement' to the code growth problem.

--
geek page at KY speaks

Re:Nice and all by Anonymous+Brave+Guy · 2002-12-11 08:08 · Score: 3, Informative

It's quicksort n log(n) and shellsort n^2?

It depends on what you're measuring.

While Quicksort is O(nlogn) in typical use, its worst case performance is O(n), although most serious implementations would use a variation such as Introsort to limit the worst case behaviour.

Mergesort is O(nlogn) in all cases. However, the constant factor is usually significantly greater than Quicksort, making the latter preferable for most uses.

Shell sort is significantly slower for most situations, though IIRC it's pretty good with data that's already almost sorted.

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.

Re:Genetic Algoritm wistfulness? by Anonymous+Brave+Guy · 2002-12-11 08:16 · Score: 2

Trust me, you do not want to suddenly unleash an efficient algorithm for factorising into large primes on the world. For a start, security as we know it would more-or-less disappear overnight, if only because so much important data has been encrypted using algorithms that assume such factorisation to be hard.

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.

Re:Nice and all by vidarh · 2002-12-12 02:03 · Score: 2

You're confusing GA and GP (genetic programming). GA usually refer to a case where the program is fixed, and the genes only consists of parameters to the program. With genetic programming, your issues are valid, however they can be worked around:

- A lot of simplification of a program can be automated. Some of that simplification work will require significant work to implement, but it's doable. Other parts of the simplification can be done by changing the fitness function once you have an efficient algorithm: In addition to performance or whatever criteria used, you add measurements for program size and complexity as secondary influences on the fitness.

- Robustness of the language the program is generated in can be strengthened. Typically GP uses custom domain specific languages. This both simplifies understanding, and prevent crashes.

- The experiments can be rerun with additions to the language used that group constructs that are complex but understood in order to attempt to simplify the solutions. The population can be steered towards using these constructs by rewarding them in the fitness function.

All in all, GP is very promising for certain types of problems, but it shouldn't be understood at "press this button and out comes a polished working program", rather as a tool to approach problems that are hard to analyse (especially where there is no 100% "correct" solution, or a correct solution is extremely hard to find, but gradual approximation has the potential of working well enough). GA and GP also has great potential wherever requirements change slowly over time and the software needs to adapt, as a way of preventing manual tweaking.

Imagine a lift control GA, for instance (actually used as an example in Dr Dobbs some time ago), where a near optimal solution could be used as a starting point for the lifts behavior. However "traffic patterns" in a building change - new tenants move in, old ones move out, departments move and create changes in which floors have the heaviest traffic, and which floors the traffic is between, and a GA could quite possibly be used to automatically adapt to such changes.

On a sidenote, before anyone raise concerns over letting a "random" program run a lift, the whole point here is that a GA isn't random. You use a hard coded algorithm that enforce rules for what is acceptable and what not. For a lift, of course, you'd have low level requirements such as breaking at a specific speed, and high level requirements such as newer changing direction before all the floors requested in the current direction have been visited.

The GA will only adapt the parameters you choose to surrender control of, such as whether or not the lifts should move when not requested (some of them moving to a specific floor for lunch time, for instance), and how long to wait before doing so.

I very much doubt that by Anonymous+Brave+Guy · 2002-12-13 12:03 · Score: 2

The current system needs to be replaced, and that is the best method I can think of to replace it.

I truly and sincerely believe that you should think about that claim, and then think a whole lot harder about what you'd like to see replace it. There are plenty of places in the world, still, where the protections that we take for granted don't exist. I'm guessing you wouldn't like to be there a whole lot, and releasing all the stuff that's currently protected is a one-way trip.

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.

Re:Protections by Anonymous+Brave+Guy · 2002-12-13 16:12 · Score: 2

What protections do you take for granted?

You've seen and heard a lot about terrorism lately, I assume. When the intelligence agencies screw up, and something like Sept 11 happens, you hear all about it. How many terrorist incidents do you think such agencies prevent for every one that happens? I'm betting it's quite a few.

I realise that a lot of intelligence officers just read the local papers and use their brains, but I'm guessing that on-the-ground in-with-the-bad-guys people are still pretty important as well. If you blow the cover of every single intelligence officer in the world, who do you think it's going to hurt more, us or the bad guys?

Next up... You have a bank account, right? How would you feel about not only letting anyone who wanted to see the details, but letting any script kiddie with half a brain play with the money itself? If you remove the encryption that's used when things like transfers are done electronically, what exactly is going to stop them?

Now, I don't know whether either of these specific things is a valid example. I kinda assume that military and serious finance types have more than just a couple of prime numbers to protect their data. Then again, screw ups do happen, so maybe it's not so much more. It doesn't really matter anyway. Even if these exact things wouldn't crumble if you released an efficient prime factorising algorithm, there are numerous other things along the same lines that are bound to. RSA is a well-known and widely used algorithm that would be cracked instantly, for a start.

Now, I don't know about you, but personally I rate my personal safety, the security of my finances, the confidentiality of my health records and such pretty highly. Right about now, they're reasonably protected as far as I know. If you release something that cracks such a popular and fundamental encryption technique overnight, that all disappears. Me, I reckon I'd prefer living in the West to living in a third world country where the only people who have any control are the guys with the biggest guns, but maybe I'm just dumb like that...

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.

Re:The entire game changes by Anonymous+Brave+Guy · 2002-12-14 02:39 · Score: 2

Unfortunately, the sort of society you're describing couldn't come about overnight. Those sorts of changes would take many decades to get right even starting from an organised beginning. In the period that would follow the breakdown of encryption, I suspect what you'd actually see would be a period of panic and relative anarchy, a clampdown into police states to control it, and subsequently many years of corruption in governments desperate to retain control until they could reestablish the mechanisms that were there before using alternative means. The entire game doesn't change at all -- breaking one crypto algorithm isn't anything like enough to catalyse that -- it just gets really ****ty for a while.

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.

Slashdot Mirror

Genetic Algorithm Improves Shellsort

37 of 71 comments (clear)