Sort Linked Lists 10X Faster Than MergeSort
virusfree tells us about a new algorithm that has been developed that the author claims can sort a linked list up to 10 times faster than MergeSort. "BitFast," a member of the Hash algorithms family, is available in C and C++ under the GPL.
http://en.wikipedia.org/wiki/Radix_sort
Marketing.
Nice, but you could just copy the list to array and do counting/radix sort. Does nothing for strings, floating point values, structures...
It's a bit of an apples vs. oranges comparison to put this up against mergesort - mergesort is a comparison-based sort, while Papadopoulos' bitfast is a radix sort and thus O(N*W) where N is the number of elements and W is the width of each element in bits. (hint - try sorting 1000-byte strings with bitfast, and see which is fastest) And no, it doesn't have anything to do with hashing.
However, it's definitely a clever way of implementing radix sort with linked lists, which may make it useful in some cases (e.g. OS internals) where you don't want to allocate space for a big directly-addressable array.
First, it's just a radix sort [well that's my take]. Second there are too many google ads on that page. Third, merge sort is O(n log n) time, hard to beat for random data. Fourth, most people use variations of the QUICK SORT not merge sort.
Tom
Someday, I'll have a real sig.
It's useless to talk about the speed of a sort algorithm without discussing its complexity.
Linear improvements are completely uninteresting. Take algorithm X in Perl, then a version in optimized assembly, and there you go, a 10X improvement without having done anything groundbreaking.
A 10X faster bubble sort is just as useless for large enough values of n, and a 10x slower merge sort would still have a very good performance in comparison.
It was used by 80-col card sorters, since circa 1925.
See Knuth, Volume 3
Let me know when there's a video on Youtube of the sort doing it's thing with colored bits and animation.
Reviewing just the first hour of video games.
No.
You are an "HTML programmer".
An oxymoron and a moron, simultaneously.
Defenestrate yourself.
This was written by VirusFree! Who wouldn't want to use something by a guy named VirusFree? It's a million times better than something by FreeVirus.
I've never seen it done before, but I don't imagine it would be particularly diffucult to pull off. I'd say just add another parameter that tells the function how deep in the call stack it is. If it's less than x layers deep, fork with the child process going on to the other processor. The parent process sorts the first half of the list, and the child process sorts the second half. When the parent process finishes, it waits for the child process to finish (if it hasn't already), then collects the child's sorted list and merges as usual. The reason you'd want to track how deep you are is so that you don't wind up forking when there aren't any more processors available. For instance, if you were running on a two processor machine, you'd only want to fork once, since forking again would only add overhead without any gain. You could probably generate x on the fly based on the number of processors in the user's computer, too.
The only problem I forsee is the case where the added overhead exceeds the gain. However, I imagine this would only be the case when you are sorting very small lists. Perhaps one could check the size of the list first and then decide whether or not forking is worth it.
Sunwalker Dezco for Warchief in 2016
You're making the baby Knuth cry. Please read "The Art of Computer Programming" before proceeding any further.
I've done a bit of scientific programming, but I'm not a computer-science-oriented person by any stretch. Thus, I'm a little confused by this. Under what license are things like Quicksort and Stupidsort (for another, very bad example) "released"? Why would a person feel it necessary to release an algorithm under any kind of license at all, in fact? No textbook would want to include GPL-ed code--wouldn't that make the entire book GPL by nature of the license?
Please correct me if I'm mistaken, but this seems like a fairly bad precedent to set.
I have studied a course in parallel algorithms, including localized parallel merge sort (the algorithm you requested). It can be used to subdivide parallelized sorting theoretically unlimitedly. Links: Course homepage, The relevant chapter (PDF) of the course slides, with nice pictures and everything.
I have a really elegant proof for Fermat's last theorem. If this sig was only a bit longer...
I guess it technically *is* O(N * log(N)) to the number of items, but this is misleading. It's actually O(N).
As pretty much everyone has pointed out, it's just radix sort. The time taken by radix sort scales linearly to the number of keys, and with the log of the maximum key that can be held by the container.
If we're dealing entirely with unique keys, this is of course >= log(N), by the pigeonhole principle and all that. If we may have duplicate keys, however, there may be more keys than container space, and radix *does* scale linearly to the keys.
Time to prepare and overhead of this alrogrithm are negligable in any context large enough for us to care about the O() of the algorithms. For some things, this *is* better than mergesort. I'm just not sure why it was posted, as it's also not new - it was invented over 50 years ago.
> First of all, algorithm running times and complexities are not compared as 3x or 5x faster. Please give me running time in Big O notation. Saying 10 times faster than merge sort makes no sense.
But I came up with a sorting algorithm yesterday that's 12 times faster than bubblesort! Click here to see my advertisements....
Seriously, mergesort (but NOT quicksort) is optimal, O(n log n) in one model. But there are sub-n log n sorting algorithms in other models. Some of them are based upon radix sort, surprise, surprise....
Ok, I'll start from his site:
- Programming Skills : Visual Basic ***Excellent***
Yes, that certainly is... excellent.
- Message : "Don't ever let school, stop you from learning!"
School could, learn you some grammar.
The algorithm is being released under the GPL ( General Public License ). The algorithm belongs to PhoenixBit and VirusFree but you may use/modify it freely.
*** DO NOT COPY ANYTHING FROM THIS PAGE TO ANY OTHER PAGE. IF YOU WANT SOMEONE TO READ THIS THEN LINK TO THIS PAGE ***
In addition to trying to apply copyright to an algorithm, doesn't a restriction on copying defeat the purpose of releasing something under the GPL? Or does text in all caps trum previous text not in all caps?
Feel free to add to this. If there are some clips of this guy with lightsabers, pleas post them.
This is a variant of RadixSort, which is well known to be faster than any comparison sort such as MergeSort. The problem with non-comparison sorts is that as such they are restricted to sorting items representable a unstructured bit fields, which means, essentially, integers. A large part of the time, the real problem in sorting is (a) extracting the fields that you want to use as keys (since it is not generally the case that you want to sort on the entire record) and (b) arranging for each pair of records to compare as you need them to, which involves both recognizing the internal structure of keys (consider the case of dates) and imposing suitable weights for the individual components. In other words, in many situations the bulk of the code and time are devoted to parsing and transformation of records. So long as you are not using a really bad algorithm, the time devoted to the sort itself is likely to be a small percentage of the total time.
For example, I have written a sort utility that differs from most others in its ability to parse the input into records and records into fields and in the transformations it can apply so as to use unusual sort orders and sort on things like dates, month names, and numbers in non-Western number systems. It was originally written for alphabetizing dictionaries of "exotic" languages. It is frequently the case that the time devoted to the actual sort is less than 1% of the run time.
In sum, non-comparison sorts have a niche but are of limited utility because they get their speed from making use of additional information that is only available for a limited set of datatypes. For the great majority of applications, only comparison sorts are flexible enough to be of use.
Well, consider the source. It was written by a 21 y/o student who has listed his primary language as Visual Basic. Obviously he isn't trained in theory. That said, a nice discovery for him, but not Slashdot news, and definitely not academic journal material.
Quicksort may have a best expected sorting time, but still has an O(n^2), since O is worst case. I think the STL takes a pass at the list to determine how sorted the list appears to be and then picks a sorting algorithm based on that... but it's been awhile since I looked at it.
Retard who thinks you can GPL an algorithm manages to reinvent an algorithm first implemented as a computer program in the 1950s, and used way before then outside of digital computers.
Also manages to not know about memset and write C code that assumes sizeof(long)==4 and puts non-inline code in header files. I don't dare look at the C++ code since there's a large probability it will cause permanent brain damage.
His main BitFast code is in header files...
Messy and inconsistent indenting (check c/ and c++/ files for differences)
Difference between c/* and c++/* is that c++/ has cout instead of printf()... well done...
This stuff should go to worsethanfailure.com (was TheDailyWTF.com)
Perhaps coming from a math background, my priorities differ from those with a CS background, but...
Where's the proof?
Is it peer reviewed? Has it been published? Does it exist?
Although the lack of a proof does not make your algorithm is incorrect, I'd be suspicious. In fact, I would be unwilling to use any algorithm in my code without being sure that a good proof of its correctness exists somewhere.
Proofs are more than formalities, they're essential: without a good proof, you're just a crank selling snake oil.
Nothing sucks like a Vax, nothing blows like a PowerMac G4
Unless I'm very much mistaken, there's a blatant buffer overflow in this code. In bitfast.h the "Tail" and "Head" arrays are defined as having 65535 (0xffff) entries. That's from 0-65534 inclusive.
The code in "BitFast" then uses the array as if it can access Head[0xffff] and Tail[0xffff] - which is one past the end of the arrays. I'd guess it works by coincidence on Win32.
Additionally, although there might be some reason I can't follow at the moment, in the float version only the top 15 bits of the values are seemingly used. This would seem to ignore the sign bit of the FP representation. Maybe there's another subtlety I'm missing, but I can't see it at the moment.
Given these basic misunderstandings of programming concepts I'm not surprised the authors hadn't known of the radix sort which many people here have linked as being very similar indeed.
GodboltBlog
1. Loop ......................AddHobbie ("Computers"); ................Do Until 1=2
Since he says he's "**Excellent**" in Visual BASIC, one could ponder how he MIXED UP THE Do AND Loop STATEMENTS.
2. Don't ever let school, stop you from learning!
He apparently never had...
3. 10 times faster than MergeSort? That's asymptotically, right? O(10nlogn). Call the press. (Doesn't matter that Radix Sort actually takes O(n), as long as it's 10X!)
4. BitFast start by creating two arrays of size 65535 ( pow(2,16) )
Anyone got a calculator? Somehow his numbers don't add up. Is it actually possible for a positive power of two to be odd?
And...Since when do you use pow to calculate powers of two?
5. They are not the most error free,optimized,commented,beautifull code to have ever been written
Gee...really?
Explanation: pretty much bogus.
As for the quality of your post: let me guess--you also complain about slow, bloated software, right? The old, "Intel giveth, Microsoft taketh away," adage? Users have several orders of magnitude more compute and storage power than 'back in whatever day' yet personal computers seem little more responsive, etc.
Don't feel lonely. There's a large population of lugnuts like you, who, if they think of CS at all, largely carp about how some CS departments haven't become current technology de jour tradeschools. Some, unfortunately, have, but that's a whole different discussion, which has been seen on Slashdot time and again.
Algorithm research is important, as is having at least something of a grasp of algorithms. In *your* next programming exercise, since you seem to regard sort efficiency as 'esoteric', feel free to reinvent the bubble sort. Also, tout it as the Next Great Thing, and submit patches against all your favorite apps. That will get you your twenty seconds of fame, I assure you.
Sometimes I love Slashdot--but then I read a post from some random AC idiot like you: the proverbial lowest common denominator. Maybe you should restrain your efforts toward what you seem to regard as cool snarky posts, watch a thread (about which you plainly know nothing) develop, and maybe freaking *learn* something.
OTOH, maybe I'm just bitter right now, because I've just been doing a search through Google news on climate change, and I'm pretty much convinced that the last thing the human race needs right now is people like you.
What you do with a computer does not constitute the whole of computing.
Of course one would hope that the patent examiners at least have heared about radix sort, and therefore would reject that patent due to prior art ... but then, maybe he can avoid that by adding "with a computer" :-)
The Tao of math: The numbers you can count are not the real numbers.
Inside bitfast.h:
// with /* */, malloc and new[]
o rt
long *Tail = new long[65535];
long *Head = new long[65535];
for(n = 0; n < 65535; n++){ Head[n] = 0; Tail[n] = 0; }
Memory leak of 128KB for each time the function "Node* BitFast(Node* HeadNode,long run)" is called.
MISSING: delete[] Tail; delete[] Head;
Furthermore the following is faster or use memset:
long Tail[65535] = {0};
long Head[65535] = {0};
Unless you are running this in DOS16, DOS32 with Pharlap or smaller 8051, 68000 processor, you do not need to use new[] or malloc() for this.
In InsertToList, Node *tmpP = (Node*)malloc(sizeof(Node));
free(tmpP) missing inside FreeList()... no such function
In main.c:
Node *L1 = malloc(sizeof(Node));
Node *L2 = malloc(sizeof(Node));
MISSING: free(L1); free(L2);
In main.cpp:
Node *L1 = new Node;
Node *L2 = new Node;
Instead use (no need to use heap, use stack memory):
Node L1 = {0}, L2 = {0};
Mixing the usage of std::cout and printf() is not recommanded.
Use all of the former or the later, else you will have weird display on some run, unless you flush the output buffer. I suggest you use printf() all the way.
cout << "Ok!!!!" << "\n\n" << "Merge Sort : " << tim1 << " ms\n" << "BitFast : " << tim2 << " ms\n\n";
becomes:
printf("Ok!!!!\n\nMerge Sort : %ld ms\nBitFast : %ld ms\n\n", tim1, tim2);
Furthermore, your implementation of link list, and calling Length(Node*) for every equal is highly inefficient, requires O(n). Store the link list length somewhere when inserting. EqualSplit takes O(1.5n) instead of O(0.5n) because of that.
Something like:
typdef struct LinkList {
Node head;
int length;
} LinkList;
As a result, MergeSort recursively call itself, which calls EqualSplit every turn.
You should make your header file C friendly, and avoid mixing
No clue why this header is needed:
#include <iso646.h> instead of not/and use use ! and &&
before: if(not p) return NULL;
becomes: if(!p) return NULL;
Finally, BitFast is not recursive; therefore, you should try to find an implementation of merge sort which is not recursive also, so you save on function calls, if possible.
"However, iterative, non-recursive, implementations of merge sort, avoiding method call overhead, are not difficult to code." -- Wikipedia
http://en.wikipedia.org/wiki/Merge_s
Your function should be in a file.c or be inline.
No. Quicksort is O(n log n) on average. There are pathological scenarios, but all other instances absorb the O(n^2) worst case. Sedgewick's PhD (under Knuth's) was about Quicksort, try reading it if you are nuts enough. Or even read Knuth's volume on sorting, to see hem distilating and comparing even the constants from the algorithms.
The Art of Computer Programming, vol 3, Sorting and Searching, 1973, by Donald Knuth Page 173, Algorithm R - radix list sort. No credit is given. The algorithm has been well known and obvious since the first link list was invented. Hey! A good choice for a patent!
Don't use this code !
:
:-)
First, there are multiple bugs in the code itself :
1) everywhere, it is assumed that 2^16 = 65535 and not 65536. Arrays are allocated for this size too, but are used with 65536 entries. (un)fortunately for him, the space is probably allocated anyway so the program does not crash on this.
2) a similar bug in the random number generator carefully avoids dangerous values, because he generates numbers between min and max by doing a modulo on max-min , which prevents max from being produced. This is not a
big deal anyway, just that it hides the bug.
3) he assumes everywhere that "long" is unsigned and "short int" too. This makes the code SegFault on Linux with negative array indexes. The fix is obvious, though.
And most of all : times are measured in optimal conditions. Indeed, this algorithm is
just a radix sort with a large radix (2^16). A radix algorithm is efficient when the number
of elements is high compared to the radix, because at every level, all radix entries have to
be scanned for existing values. In his example, he sorts 4 million elements. His code is faster
than the merge sort on 4 million entries, just as would be any radix sort. But on smaller sets,
it still takes the same time per level, because it has to scan all 65536 entries even if there
are only 100 numbers. The fact is that for less than 10000 values, the mergesort becomes faster in his own code.
When measuring algorithms complexity, we use the O(x) notation. The time per iteration is never
considered. When the time matters (as in his example), we should consider the time per iteration (T), and the total time (t) will follow
t = T . complexity
In complexity, his algorithm is O(N) on 16 bits, but the time per iteration is very high. Other sorting algorithms are O(N.log(N)). So the total time is :
t1 = T1 . O(N) for his algorithm, and :
t2 = T2 . O(N.log(N)) for the mergesort.
But when T1 > T2.log(N), he looses, which is almost always because log(N) is small for 16 bits, and T2 too because operations are very simple. In his case, T1 is very high because it represents the time required to scan 65536 values.
It's amazing that he fell in such a beginner's trap. I suppose that he's still a student, which would explain the total lack of portability and newbie's bugs in the code.
Well tried anyway
Willy
It's only March. There'll be an inventor of the internal combustion engine along shortly.
I invented Radix sort once, too. Who hasn't? Let him have his 15 minutes of fame.
Maybe this guy has re-invented radix sort? I can't really tell! I wish somebody would post and tell me!
1. Although it's technically O(n) compared to QuickSort's O(n log n), consider the fact that
a. the number of passes required by radix sort is proportional to the key size, and that
b. if the keys are primarily unique, then the key length will be O(log n)
2. One cute feature of radix sort for very large data sets is that you only need linear access to the data, so it can be done with non-random-access external storage. The one practical implementation I've ever heard of used four 9-track tape drives.
However, considering use in main memory, both radix sort and quicksort make efficient use of a D-cache that is 2-way set-associative or better, and both hit the whole data set once per pass -> if it swaps, it'll swap a lot
"Those who cannot remember the past are doomed to repeat it." - George Santayana
I notice that the course homepage talks about designing for an "idealised parallel computer" -- did it take into account the huge speed hit that comes from false sharing on real processors? That should be a major consideration if you are designing shared-memory parallel algorithms, since they can degrade performance by 1700% on modern processors. If you're not careful, your parallel algorithms could end up being much slower than single-threaded ones simply because they force so many cache bounces.
The previous few posts caused me to also open up the code. Wow! What a ride! I decided that I wanted to add a few things I found as well.
I loved that he referred to his use of the short pointer as a "cool pointer hack to avoid shifts and ifs!!!" To begin with, if we have to call that a "cool hack" then the requirements for coolness have definitely dropped a bit. I am also confused as to how this helps avoid shifts and ifs. As far as I can tell, it avoids one shift. I'd love to see the code without the "cool hack" because I'm intrigued as to how he uses an if statement to remove the higher bits.
As stated before, I am also fond of his use of an array of size 65535 rather than 65536. I surprised he didn't run into any segmentation faults. I know that I can give him a test case that will seg fault pretty easily.
I also question whether he understands the purpose of a header file, as the entire source for his BitSort is contained within a header file. I guess it would sort of make sense if the function was declared inline, but it isn't.
Another fun element is how non-modular his code is. Everything is hard-coded, even though it would have been easy to declare a few constants or even make the function a template.
Lastly though, why the crap did this article get through the editors. It is, as has been stated over and over, a radix sort and nothing different. I know that he claims it is different because it is in-place, but that is such an obvious simple change that it does not warrant a whole new algorithm. Regardless of the content, the article is so poorly written with some of the worst grammar I have read in quite some time. Slashdot really needs to improve its article standards.
cartoon about Knuth