Sort Linked Lists 10X Faster Than MergeSort
virusfree tells us about a new algorithm that has been developed that the author claims can sort a linked list up to 10 times faster than MergeSort. "BitFast," a member of the Hash algorithms family, is available in C and C++ under the GPL.
It's useless to talk about the speed of a sort algorithm without discussing its complexity.
Linear improvements are completely uninteresting. Take algorithm X in Perl, then a version in optimized assembly, and there you go, a 10X improvement without having done anything groundbreaking.
A 10X faster bubble sort is just as useless for large enough values of n, and a 10x slower merge sort would still have a very good performance in comparison.
His algorythm is "in place"
radix is not
I agree though that the concepts for both algorithms are very similar.
Sixth, the headline "10X faster" is incorrect, as they differ asymptotically, not by a constant factor. (Run different data sets...vary by size of a single element, and by number of elements in the list. The ratio will change.)
AFAIK, quicksort (as well as mergesort) is optimal algorithm for comparison based sort, which means you can not find anything asymptotically better than that.
And how about submitting it to academic journals if you come up with something better?
Fifth, the author tries to "GPL" the algorithm, which is utter nonsense. GPL deals with copyright, so the most he can do is GPL his implementation of his bucket-/radix sort. Anyone is free to re-implement the algorithm, GPL or not.
It would require a software patent to restrict the use of the algorithm to GPL programs.
(And sixth, a quick look in a text book would have clued the author in)
Actually sorting linked lists is useful in many cases, and can be done fairly efficiently with a well written quicksort (without the step of converting to array that this code does), since quicksort is conceptually just recursive partition around a pivot element - partitioning a linked list is cheap and can sometimes be faster than using an array if the objects being sorted are complex and stored by value (i.e. in any case where moving the linked list pointers around would be faster than copying the objects). A variety of other sorts are also easily adaptable to linked lists.
This is a variant of RadixSort, which is well known to be faster than any comparison sort such as MergeSort. The problem with non-comparison sorts is that as such they are restricted to sorting items representable a unstructured bit fields, which means, essentially, integers. A large part of the time, the real problem in sorting is (a) extracting the fields that you want to use as keys (since it is not generally the case that you want to sort on the entire record) and (b) arranging for each pair of records to compare as you need them to, which involves both recognizing the internal structure of keys (consider the case of dates) and imposing suitable weights for the individual components. In other words, in many situations the bulk of the code and time are devoted to parsing and transformation of records. So long as you are not using a really bad algorithm, the time devoted to the sort itself is likely to be a small percentage of the total time.
For example, I have written a sort utility that differs from most others in its ability to parse the input into records and records into fields and in the transformations it can apply so as to use unusual sort orders and sort on things like dates, month names, and numbers in non-Western number systems. It was originally written for alphabetizing dictionaries of "exotic" languages. It is frequently the case that the time devoted to the actual sort is less than 1% of the run time.
In sum, non-comparison sorts have a niche but are of limited utility because they get their speed from making use of additional information that is only available for a limited set of datatypes. For the great majority of applications, only comparison sorts are flexible enough to be of use.
I am TheRaven on Soylent News
Perhaps coming from a math background, my priorities differ from those with a CS background, but...
Where's the proof?
Is it peer reviewed? Has it been published? Does it exist?
Although the lack of a proof does not make your algorithm is incorrect, I'd be suspicious. In fact, I would be unwilling to use any algorithm in my code without being sure that a good proof of its correctness exists somewhere.
Proofs are more than formalities, they're essential: without a good proof, you're just a crank selling snake oil.
Nothing sucks like a Vax, nothing blows like a PowerMac G4
Unless I'm very much mistaken, there's a blatant buffer overflow in this code. In bitfast.h the "Tail" and "Head" arrays are defined as having 65535 (0xffff) entries. That's from 0-65534 inclusive.
The code in "BitFast" then uses the array as if it can access Head[0xffff] and Tail[0xffff] - which is one past the end of the arrays. I'd guess it works by coincidence on Win32.
Additionally, although there might be some reason I can't follow at the moment, in the float version only the top 15 bits of the values are seemingly used. This would seem to ignore the sign bit of the FP representation. Maybe there's another subtlety I'm missing, but I can't see it at the moment.
Given these basic misunderstandings of programming concepts I'm not surprised the authors hadn't known of the radix sort which many people here have linked as being very similar indeed.
GodboltBlog
Explanation: pretty much bogus.
As for the quality of your post: let me guess--you also complain about slow, bloated software, right? The old, "Intel giveth, Microsoft taketh away," adage? Users have several orders of magnitude more compute and storage power than 'back in whatever day' yet personal computers seem little more responsive, etc.
Don't feel lonely. There's a large population of lugnuts like you, who, if they think of CS at all, largely carp about how some CS departments haven't become current technology de jour tradeschools. Some, unfortunately, have, but that's a whole different discussion, which has been seen on Slashdot time and again.
Algorithm research is important, as is having at least something of a grasp of algorithms. In *your* next programming exercise, since you seem to regard sort efficiency as 'esoteric', feel free to reinvent the bubble sort. Also, tout it as the Next Great Thing, and submit patches against all your favorite apps. That will get you your twenty seconds of fame, I assure you.
Sometimes I love Slashdot--but then I read a post from some random AC idiot like you: the proverbial lowest common denominator. Maybe you should restrain your efforts toward what you seem to regard as cool snarky posts, watch a thread (about which you plainly know nothing) develop, and maybe freaking *learn* something.
OTOH, maybe I'm just bitter right now, because I've just been doing a search through Google news on climate change, and I'm pretty much convinced that the last thing the human race needs right now is people like you.
What you do with a computer does not constitute the whole of computing.
Well, I'm not sure, but I'm pretty sure it's not going to be the guy who made it again 30 years after everyone was already using it.
Damn you! You made me look at his code! The goggles, they do nothing.
His cleverness gets the better of him when it comes to his modified radix sort. For example, he creates two arrays (on the stack) of 65535 elements; apparently unaware that this creates an array with indexes 0..65534.I can't help feeling he should have declared Ptrs and Tails as Node* arrays, and bypassed all that random casting to longs. Not sure what's going on there. But then anyone who thinks the roundabout way he used of accessing the top 16 bits of a 32-bit memory value is 'cool' is definitely on my list of people most likely to re-invent the radix sort badly.
At first, I thought it wasn't a stable sort, but looking further into it, that's because he mixed up the 'head' and 'tail' descriptions in the explanation (or possibly in the code).
It is amusing that some /. posters think that doing this in-place is somehow an amazing leap of insight. Sometimes /. is like reading thedailywtf.com - you see something dumb as the main story, and then find half the people commenting on it have even less of a clue.
I must be new here.
1. Although it's technically O(n) compared to QuickSort's O(n log n), consider the fact that
a. the number of passes required by radix sort is proportional to the key size, and that
b. if the keys are primarily unique, then the key length will be O(log n)
2. One cute feature of radix sort for very large data sets is that you only need linear access to the data, so it can be done with non-random-access external storage. The one practical implementation I've ever heard of used four 9-track tape drives.
However, considering use in main memory, both radix sort and quicksort make efficient use of a D-cache that is 2-way set-associative or better, and both hit the whole data set once per pass -> if it swaps, it'll swap a lot
Are you kidding ???
Obviously you don't know what sort algorithms are use for ! Any real time system or multi-task
subsystem (application or kernel) needs a scheduler to very quickly sort timers in order to wake
up tasks at the right moment without scanning the whole list. For instance, in a proxy like
Squid, you would need something like 10000 struct timeval to handle 10000
connections. If the algorithm takes milliseconds to sort 10000 values, your proxy will be
the slowest on earth. Worse, if it takes as much time when you have only 10 or 100 connections,
it is a real shame. Sorting 10000 values should take microseconds in such applications.
The same is true for kernel code. Did you ever wonder how the socket timeouts are handled ?
When you have to update your timeouts at every packet, you need something very fast,
clearly not an O(n^2) algorithm, nor bitfast
Willy