Deep Algorithms?
Stridar writes "A paper presented in a recent article quotes Donald Knuth as saying the computer science has 500 deep algorithms. He mentions that Euclid's algorithm is one of the most important, and he seems to agree with the idea that CS will be mature when it has 1000 deep algorithms. What I would like to ask Slashdot is the following. What are the most important algorithms in CS? What is your favorite algorithm? And finally, what are the outstanding problems for which algorithms would be immediately placed in the "Top 1000" category." We had an older story where two scientists picked their top ten algorithms.
THe algorighm is simple, powerful and beautiful. Its properties allows to use for encryption or for authentication. It is simple enough that can be described in a piece of paper, and understood with basic mathematical background, and it affected the e-world in many different, some of them still to be seen.
Of course,
Lather. Rinse. Repeat.
Anything you can do, I can do meta.
My personal favorite is the skiplist. O(ln n) insert, search, and delete in the average case. Simple to understand, has good constant factors, doesn't require maintence (unlike trees). Really, what more could you want?
Here's the paper:
ftp://ftp.cs.umd.edu/pub/skipLists (many formats)PDF
You're damn right it's important.
Novel theory: Modern Man evolved from psychopath
Resolving dependencies between any number things requires this very useful graph sorting algorithm.
The Official Steve Ballmer Webpage
Algorithms? Its all about PATTERNS now-a-days!
Honestly, I don't think CS will be considered "mature" just by the number of complex algorithms it has.
There's more to CS than algorithms.
And, I always thought algorithms were grouped into "Discrete Mathematics" not "Computer Science" (granted, there is overlap, but isn't there overlap in most sciences??).
Good quote, too many chars. Seriously, the slashdot 120 char limit sucks!
I dont think I'd call it my favorite algorithm, but the Boyer-Moore string searching algorithm is pretty cool.
Quicksort
The Unification Algorithm
Skip Lists
Conjugate Gradients
Karmarkar's linear programming algorithm
Knuth-Morris-Pratt string matching
Multidimensional scaling
The Kernighan-Lin TSP & graph-partitioning methods
Lempel-Ziv compression
Fast Fourier Transform
Quine-McCluskey optimization
Celine/Gosper/Zeilberger/Wilf algorithm for hypergeometric identities
Fast Multipole method
-Tom Duff
Paul
begin
while alarm ringing
cover head with blankets
mprecate the onerous noisemaker softly
consider turning the damn thing off
if feeling remarkably hyperactive
then
lethargically slither out of blankets
sinuously stretch out arm
sigh
bang it to kingdom come
else
go back to sleep sweet sleep
endif
if hear name being called
then
see who it is
if kid brother/sister
then
ready
aim
fire
watch baneful clock execute a parabolic trajectory
in approximate direction of youngster
if target intercepted
then
ignore howls for Amnesty International
else
swear a thousand maledictions
endif
else if father
then
get out of bed hyper-quickly
if feeling watched
then
turn alarm off gently
else
kick alarm off gently
endif
else if mother
then
scan her for arms, especially those prohibited by
Geneva Convention
if result is affirmative
then
begin negotiations
else
pretend not to have seen her
increase snoring intensity
endif
endif
if feel something cold and wet being sloshed onto
blankets
then
yell blue murder
get out
endif
endwhile
end
Dinoj Surendran @ 1995 - no rights reserved
One of the most important algorithms ever invented.
Seriously, how about Simulated Annealing or Genetic Algorithm?
It is so far-reaching.
linear programming, minimax game searches, network flow, primal-dual techniques for approximation algorithms......
The "Content Scrambling System" it seems pretty Damn important to the MPAA and Congress. They even passed a law (DMCA) to support it..
I'm going with the Fast Fourier Transform, because it is ubiquitous in signal processing and it has various number theoretic applications. As an added bonus: The Quantum Fourier Transform can be used in Shor's Algorithm to factor numbers in polynomial time! Although, this is not yet practically realizable..
Chaos is a name for any order that produces confusion in our minds. --George Santayana
COordinate Rotation DIgital Computer
This algorithm is implemented by most FPUs and even some PGAs to calculate trigonometric and hyperbolic functions. It replaces the evaluation of those power series you've already forgotten about from school with a clever combinations of bit-shifts and additions. Back in the days when multiplications were much more expensive than additions, this is how it was done.
Alpha-Beta Pruning or "minimax" is my favorite. It is a good way to trim your search space, but as far as I know pretty much is only used in strategy game playing. Chess specifically. The hard part about it is comming quantifying the value of the moves each player can make (Number of pieces, position on the board, tactics, blah!). Unlike most tradeoffs in CS, this one saves both time & space.
Dictionary.com defines an algorithm as:
A step-by-step problem-solving procedure, especially an established, recursive computational procedure for solving a problem in a finite number of steps.
Another way to think about an algorithm is this, you start out with input data in a given format, and then run some set steps on that input data until eventually it gives you output data. The nice thing about algorithms is that when they are correctly formulated, they can work without human intervention or without thinking/reasoning (just following the steps on the data). That is why they are particularly useful for computers. But they don't have to be limited to computers. Most recipes for food could be considered algorithms, that is, a set of procedures that bring you from input to output.
A good example of a computer algorithm is one of the many sorting programs. Quicksort, bubblesort, mergesort, heapsort...these are just different algorithms for taking a list of unorganized integers and by following their steps, you get a list of integers in numerical order.
When it comes to beauty in algorithms, people are generally referring to simplicity and efficiency in algorithms. Doing things in a way that most people wouldn't normally think to do them, yet doing them in terse and efficient ways (elegance).
I'm not exactly sure what is meant by a 'deep' algorithm, but I would think it would reference just how complex the task that the algorithm solves is.
My vote for best algorithms are: Sieve of Eratosthenes (an ancient greek method for finding a list of all prime numbers), and the Fast Fourier Transform, an algorithm that has revolutionized several industries.
I drink to prepare for a fight; tonight I'm very prepared. -Soda Popinksi
It's well and fine to say that we should use the algorithms "that smart people like Knuth have invented", but that's not enough if you're going to make your living as a genuine developer. I agree that we should use the frameworks we are provided in order to get the highest productivity, but if you can't take some pleasure in the construction of an elegant piece of code (algorithmic or otherwise), then you're just a technician who can easily be replaced when the next fad rolls out.
You can put people like Knuth on a pedestal if you like, and that is certainly warranted in his case. But real progress will only be realized when you disregard people like him and do something of your own.
Please mod this post only if you think others should/n't read this. I have enough ego^H^H^Hkarma. Thanks!
An algorithm is simply a series of steps one can take that, once you have finished them, will have solved your problem. A deep algorithm is one that is especially useful, applicable in many circumstances, and has some inate cleverness that makes it non-trivial to come up with.
So, for example, an algorithm for searching could be:
1. For i first item to last item2. If item i is what you are looking for, return it
3. otherwise, go onto the next i.
This isn't a very fancy algorithm, but it works, and it is useful in many many circumstances. Of course, it is also trivial to come up with (look at every item one at a time untill you find your goal), and therefore isn't deep.
It is interesting to compare an algorithm to a heuristic. Heuristics would make great algorithms, if not for the bugs. That is, a heuristic is a set of steps you can follow that are likely to solve your problem, but aren't guarunteed. In that sense, they're just buggy algorithms.
There are also approximation algorithms. Suppose your problem is to find the shortest route that will visit a set of cities, and return you to your starting point. This, btw, is a classic problem called the Traveling Salesman Problem, and is provably rather nasty to solve (it belongs to a class of problems called "NP-Complete"). That is, if you want the SHORTEST route, the best know method is to try all possible routes (and for even a relatively few cities, that's a lot). However, there are algorithms, that if you follow them, are guarunteed to give you an answer no worse than twice as long as the best possible. That is, we can approximate the answer, within some provable bound of optimal, with a set deterministic steps. (For the nitpickers, the approximation only works with Euclidian TSP, not general TSP, and .: doesn't give a solution to the Hamiltonian Cycle problem).
Any other questions?
This cartoon is about Knuth
If tits were wings it'd be flying around.
Gotta agree with you, but not on its own.
I can't narrow it down to about 50, personally. Here're the broad-brush "highlights":
a) All of quicksort, mergesort, heapsort and radixsort.
b) FFT, DFT, their relatives, whilst I'm divide and conquering. Convolutions and shite too.
c) Graph algorithms including Kruskal's, Dijkstras. Coloring algorithms (useful for compilers).
d) Parsing algoriths, while I've got compilers in mind
e) String matching algoritms ditto
f) Compression algorithms - Huffman, Arithmetic, LZ*, BWT.
g) Cryptographic algorithms - Hashes, Private Key Fiestel Networks, Public Key 'bignum' techniques. I'll throw in CRCs here too as they're close to hashes.
h) Bignum algorithms - Karatsuba, Barrett, Montgomery, Oooh, I've had FFTs already, can I have them again?
i) Pure Maths - Euclid, XGCD. Addition Chains (e.g. Pippinger). Eratosthenes, Bernstien-Atkin likewise.
j) Trial division, Fermat's Method, Brent/Pollard Rho, Pollard/Williams P+/-1, Lenstra's ECM, Quadratic Sieve, (S/G)NFS.
k) Applied Maths - Newton-Raphson, Runge-Kutta, Tchebyshev interpolation.
Too many to count...
THL
Keeping
And the correct answer is: never.
It's true that for small lists, or lists that are nearly sorted, you want to use an O(N^2) algorithm rather than (say) quicksort. The mistake is making the leap from "an O(N^2) algorithm" to "bubble sort".
There are lots of O(N^2) sorting algorithms, with different constant factors. Bubble sort is one of the worst; see Knuth (v. 3, of course) for a detailed analysis. If you're dealing with a small list or a nearly-sorted list, you should probably use insertion sort. (Or, in some special cases, you might want selection sort or merge sort instead.)
I have yet to find any case, anywhere, where bubble sort is the right choice. If I ever teach an introductory algorithms class, I will probably omit bubble sort.
To be finite, the process must end after a predictable number of steps. Each step must be unambiguous. The process may require input (parameters) to solve the problem but when complete it must return a result. The process must demonstrate effectiveness by solving the problem in a "sufficiently basic" manner.
Gotta be inventing the Internet! How could you top that?
Speaking of sorting, the scientists contemporary to Galileo used it to "patent" their yet unverified ideas and hypotheses by publishing a "one-way hash" of the statement describing the idea by alphabetically sorting the letters of that statement. E.g. a hypothesis "Mars has two satellites" will be "Aaaeehillmorsssstttw". Of course, to be secure, the statement must be much longer.
My favorite is Djikstra's Communicating Semaphores, along with the related algorithms documented in Djikstra & Riddle's paper "The 'T.H.E.' multiprogramming System".
With Mark Weiser's addition of the "T" primative (more commonly called "non-blocking P" i.e. "Try to P but if that would block return an error flag instead.") you have a fantastically powerful tool in a tiny amount of code.
For instance: I was able to implement a kernel for an actor-based, real-time, prioritized, preemptive multitasking system, including initialization code, an idle task, and a minimal startup task table (i.e. everything but the application tasks and device drivers):
- In under 512 bytes of code and initialization data.
- On an 8080.
Communicating P, V, and T, (along with a flavor of "V" doubling as a return-from-interrupt) are a complete set of primitives for such work.
For those not familiar, an "actor" in this context is a class such that each instance of that class or any subclass of it is a separate thread of execution. Messages are exchanged between threads via queues on semaphores rather than C++ member function calls / Smalltalk message sends, but otherwise all the object-oriented concepts apply directly.
Communicating Semaphores handle locking (like normal semaphores), message queueing, and resource allocation (by holding a queue of messages, each of which represents, or actually is, a resource).
"T" lets interrupt routines run initially as parasites on the interrupted task, then "T" a free-message-buffer queue, fill in the message, and "V" it to the incoming-work semaphore of the actual service routine as the interrupt exits - provoking a context switch if the service routine is higher priority than whatever was running. The interrupt routine can punt and return to the interrupted task if no buffers are available.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
That's a bit like saying that there is no need to understand calculus because Newton and company already figured it out for you.
To the professional programmer an algorithm is a tool, and like any other kind of tool it is important to know how it works even if you didn't invent or produce it yourself.
This may suddenly dawn on you when you're coding an algorithm out of a book, and find you don't know whether it's safe to take a particular shortcut or not because you don't understand the algorithm well enough to analyze the implications. Or you're looking at some values in a debugger and can't figure out if they're correct or have been clobbered by a bug because you don't understand the algorithm well enough. And so on.
500 deep algorithms, 1000 is maturity? To me this sounds a bit like like Bill Gates saying that 640K is enough for anyone, or the ancient Greeks saying that mathematics is mature because Euclid has codified his geometric axioms, or the head of the US patent office saying that everything's been invented in 1899. (All of which are probably apocryphal, but I digress.)
It's too premature. Computer science has been around for little over half a century. Who knows what will be discovered in the centuries ahead? Mathematics is the source of many algorithms, yet new discoveries are being made in mathematics even now. Don't stop searching when we get to 1000. There's still going to be many new and wondrous algorithms to discover for the geniuses of the future.
The only thing necessary for the triumph of evil is for good men to do nothing. - Edmund Burke
Just run it under Windows and it terminates eventually just fine! Therefore it's an algorithm!
:-) )
(I hated to lose the ability to mod, but I couldn't resist!
Indeed - the last time this came up I did some benchmarks. The 'old fashioned way' using temp variables when optimised runs about twice as fast as the xor version.
In not-quite-assembly it looks like:
foo(cx,dx)
mov ax,cx
mov bx,dx
mov ax,ax xor bx
mov bx,bx xor ax
mov cx,ax
mov dx,bx
vs.
foo(cx,dx)
mov ax,cx
mov bx,dx
mov cx,bx
mov dx,ax
Since I'm not formally trained as a computer scientist (I'm merely an information technology major, sorry), I can't offer much in the way of "deep" algorithms to this list.
However, I can poke fun...
My personal favorite algorithm is:
(Ducks)
MacOS, Windows, BeOS, GNOME, KDE: they're all just Xerox copies
There's a clever derivative called "Shell short", which might be what you're refering to.
It sends coarse combs across the data at first
Then it sends finer ones, and finally the last comb is the same as the 'exchange consecutive elements only' step in quicksort.
However, whilst it appears similar, it's actually very different because it uses an iterative refinement, first coarse, then medium, then fine. The number of phases is normally much closer to about n^0.4 in typical implementations, rather than n in BS.
To say it's like bubble-sort is to say quick-sort is like radix-sort. In some ways it's true, but it misses a lot of the point.
(One pass of an in-place binary radix sort is just like one pass of a quick-sort - notta lotta people know that! You lose the order-preserving nature, but you gain in-place. Basically you hard-code the pivots to be the odd multiples of decreasing powers of 2. b10000..., then b01000... and b11000... etc.)
THL.
It's in Knuth.
Keeping
Never? Can you prove that there are no data structures and change patterns that do not result in the bubble sort being the fastest over all? I'd be very interested to see that proof.
I still maintain that there may be a situation in which a bubble sort is the right choice, and that the question is a good one. Maybe I can't think of it (and honestly, I've never used a bubble sort in any code I've written) and apparently neither can you.
For me, the most important thing to think about is to consider the situation in which you're using the algorithms. I ask the question, because I want to challenge the person I'm interviewing. The people who immediately laugh and say "Never!!" without thinking about it for a minute get passed over. If someone thought about it for a minute and replied what you did, I'd be happy with that. I'd be just as happy if someone thought about it for a minute and said, "maybe some cases in which the list is already sorted, but I'm trying to imagine how that would work".
There may be a pattern to the way the data is changed over time that means that bubble sorts will be the best. I want people to think about that, rather than laugh about the bubble sort as a terrible algorithm. It's not -- it does what its supposed to do and nothing more. It just may be an algorithm that has an extremely narrow range of applicability.
I would definitely teach the bubble sort, even if the lesson is, "sometimes there are simple and easy-to-use algorithms that are rarely the most useful." People have to learn to recognize that.
In any field, find the strangest thing and then explore it. -John Archibald Wheeler
What about a function that calculates sucessive digits of pi??? I'm not trying to troll here, I'm just wondering. Definitions are important things in any area of academia! As far as I know the function for calculating pi is not a finite process (unless you count each digit an a discreet process)
:p
A little side note: as a kid I used to crash both web servers and browsers by implementing this as a CGI script!
I'm done with sigs. Sigs are lame.
As I noted in another post, one problem with skip lists is that each node must be dynamically resizable because the number of forward pointers is changing and not known at compile-time.
Another problem with skip lists is that they are not very friendly to multiple readers and writers because the nodes provide unfortunate concurrency "chokepoints". In a binary tree, for example, subtrees can be locked without blocking readers in adjacent subtrees.
cpeterso
Merge sort does, and it is much more efficient. That is to say it's a "stable" sort. That is one reason why the C library qsort() is often implemented as a merge sort!
All sort algorithms can be made stable by putting the original positions into the keys you are sorting.
-Kevin
You know there's some guy still in the shower...
OK, so it's 1987, and I'm 8 years old. My family has just gotten our first computer, an IBM PS/2 Model 30 -- one of the systems with BASIC in ROM. I''ve taken up writing in BASIC, and do so in most of my free time. Which, as an eight-year-old, is a considerably amount of time. I'd taught myself all about Boolean logic, loops, etc., etc.
This is the part that I don't remember, probably because it's been obliterated by my family repeating the story so often. I've been in the shower for something like half an hour when my mother starts knocking on the door, wanting to know if I'm OK. I insist that I'm fine. This process is repeated for a while until they finally force me to get out, no doubt prune-like by this time. My mother asks me what in the world I've been doing in the shower for so long.
I point to the directions on the back of the bottle and say, simply, "Wash. Rinse. Repeat."
-Waldo Jaquith
Where x is the number of people in the elevator and y is the number of people who know for sure who farted.
It's more like you have to consider real-life situations individually. My point is very much like what you said -- you can construct a zillion different algorithms to do something, but the vast majority of them will be practically useless in anything but a really weird situation. Recognizing when they are and aren't useful is an important skill. And most of the time, you'll go with the tried-and-true quicksorts etc.
However, it's important to think about each situation. Pure algorithms are usually designed with the general case in mind. Individual situations, however are always specific cases. Most of the time, the general solution is obviously useful. Sometimes, however, there is a special case that might not fit the mold, and one of these crazy one-off algorithms might fit the bill exactly.
In my mind, you will not get a job with me if you cannot see what makes each application of an algorithm unique. Computer science optimizes the general case. Real-life programming optimizes the special case that you're dealing with in your particular program.
In any field, find the strangest thing and then explore it. -John Archibald Wheeler
Bubble sort is frequently the right choice and, more often, the "good enough" choice. I learned this the hard way a long time ago. Unless you're sorting at least tens of thousands of items, with today's computers it's unlikely the user will notice any difference in execution time regardless of the sort algorithm used.
Bubble sort has the huge advantage that it can be programmed in about five minutes without reference to any algorithm book and it's simple enough you're unlikely to make any mistakes.
Academia has incorrectly given bubble sort a bad rap. The same could be said about the "goto", but that's a different discussion.
I think what Knuth means by "deep algorithm" is one that seems fundamental, something that tells us about the nature of reality, not necessarily something that is useful.
- Have a picture
And the correct answer is: never.
Off the top of my head I can think of at least three major factors BubbleSort has in it's favor.
The fastest to write.
The lowest chance you will write a bug into it.
The best known. Any programmer who sees the comment /*BubbleSort*/ will have instant and complete understanding of your code. It is also the easiest to spot when it isn't commented.
BubbleSort is often the best choice for trivial tasks. A rock may not be the best tool for any job, but sometimes it is the simplest and most convient.
-
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
Pessimal Algorithms and Simplexity Analysis Read it---you'll like it. Find out the best algorithm to use if your boss makes you sort a list in Paris.
Why are you writing sorting routines anyway?
The fastest sort to write is the call to the library sort. qsort().
The lowest chance of writing a bug into a sort is the library sort. qsort().
The best known sort is the library sort. qsort().
Obviously other languages may have different library sorts, but IMHO any C/C++ developer who claims ignorance of qsort() is immediately and ruthlessly demoted to "2 years experience with little likelihood of succeeding in the field" category. This is a hard line, but I have yet to hear any reasonable excuse for being ignorant of the basic tools of your profession and being proud of it.
There are rare circumstances where I'll write my own sorts... but only after looking HARD for a way to call the library sort, and only because I've had a full year of graduate-level algorithms. Writing a good sort routine is *hard*, and it should only be done by people who know sorts cold. E.g., can you provide the running time and worst case performance of quick sort, Shell sort and heap sorts, and when those sorts might be worth the the effort instead of using the standard library sort?
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
GCD is used for simplifying rational numbers. So it's used in pretty much any decent maths library.
And rational numbers have a *lot* of uses. CAD, spreadsheets, high precision calculations, financial figures where rounding is unacceptable etc.
I expect GCD also has implications for packing problems and complex scheduling algorithms where you need a quick check on which items are likely to "fit" together effectively. Anyone have experience of this?