Slashdot Mirror


Sort Linked Lists 10X Faster Than MergeSort

virusfree tells us about a new algorithm that has been developed that the author claims can sort a linked list up to 10 times faster than MergeSort. "BitFast," a member of the Hash algorithms family, is available in C and C++ under the GPL.

27 of 326 comments (clear)

  1. How is this not a radix sort? by Anonymous Coward · · Score: 5, Informative
    1. Re:How is this not a radix sort? by acidrain · · Score: 4, Informative

      This algorithm is exactly how my first year comp-sci teacher taught radix sort. I'm guessing "VirusFree" stole his lecture notes.

      It's actually much cheaper to use arrays of indices or arrays of pointers internally instead of linked list operations, but "BitFast" is a good naive implementation if you don't wasn't to confuse first year comp-sci students who only just learned about linked lists.

      Here is a good paper of radix sort optimization, and it covers using radix sort with floating point.

      --
      -- http://thegirlorthecar.com funny dating game for guys
  2. It's radix sort. by Nick+Mathewson · · Score: 5, Informative
    From the page:

    4. Logic of BitFast BitFast works with numbers in the bit level. It applies masks to cut the numbers so it uses only the desired ( based on the programmer/problem specifications ) number of bits every run. It always start with the LSB ( Less Significant Bits ) ignoring the any other bits and every run it moves to the higher block of bits until the whole list is sorted. So is we want to sort 8 bit every run , the first timer it will sort 0-7 bits ( Least significant ), then the 8-15 bits , then the 16-23 bits and for last the 24-32 bits ( Most significant ) , having now sorted the entire list of 32bit numbers
    Congratulations, you have reinvented radix sort.
    1. Re:It's radix sort. by vidarh · · Score: 5, Informative

      Making it in place is just an implementation detail, just as most other sorts can be done either in place or not (a typical example is quicksort, where most naive implementations tends to copy, while the typical C implementations tends to be in place).

  3. Nice, but... by i+kan+reed · · Score: 1, Informative

    It's still in the N log(N) time class. Mergesort has minimal time overhead to prepare and only N memory overhead. Do we have that for this algorithm?

    1. Re:Nice, but... by dlthomas · · Score: 5, Informative

      I guess it technically *is* O(N * log(N)) to the number of items, but this is misleading. It's actually O(N).

      As pretty much everyone has pointed out, it's just radix sort. The time taken by radix sort scales linearly to the number of keys, and with the log of the maximum key that can be held by the container.

      If we're dealing entirely with unique keys, this is of course >= log(N), by the pigeonhole principle and all that. If we may have duplicate keys, however, there may be more keys than container space, and radix *does* scale linearly to the keys.

      Time to prepare and overhead of this alrogrithm are negligable in any context large enough for us to care about the O() of the algorithms. For some things, this *is* better than mergesort. I'm just not sure why it was posted, as it's also not new - it was invented over 50 years ago.

  4. Only lists of integers by iamacat · · Score: 2, Informative

    Nice, but you could just copy the list to array and do counting/radix sort. Does nothing for strings, floating point values, structures...

  5. radix sort vs. comparison sort by Peter+Desnoyers · · Score: 5, Informative

    It's a bit of an apples vs. oranges comparison to put this up against mergesort - mergesort is a comparison-based sort, while Papadopoulos' bitfast is a radix sort and thus O(N*W) where N is the number of elements and W is the width of each element in bits. (hint - try sorting 1000-byte strings with bitfast, and see which is fastest) And no, it doesn't have anything to do with hashing.

    However, it's definitely a clever way of implementing radix sort with linked lists, which may make it useful in some cases (e.g. OS internals) where you don't want to allocate space for a big directly-addressable array.

    1. Re:radix sort vs. comparison sort by Wavicle · · Score: 5, Informative

      Actually it is much worse than you say.

      I don't know what possessed me to look at his code, but he really hasn't shown what he claims to have shown. It isn't clear to me that his code is doing what he says, but what IS clear to me is that his comparison to mergesort is very unfair. He has a TRULY AWFUL implementation of merge sort. For example, every time he goes to split a sub-list in half, he does a linear search from the current node to the end the sub-list. Having determined this value he then does ANOTHER LINEAR SEARCH to find the half-way point.

      So he has basically made an O(N^2) time complexity process just to divide the list for the merge sort. This is inside his n*log(n) merge sort. So he has an O(n^3) time complexity mergesort and trumps up how fast his modified radix sort is. Come on! Bubble sort would have beat his mergesort.

      His cleverness gets the better of him when it comes to his modified radix sort. For example, he creates two arrays (on the stack) of 65535 elements; apparently unaware that this creates an array with indexes 0..65534.

      --
      Education is a better safeguard of liberty than a standing army.
      Edward Everett (1794 - 1865)
    2. Re:radix sort vs. comparison sort by Wavicle · · Score: 4, Informative

      But then anyone who thinks the roundabout way he used of accessing the top 16 bits of a 32-bit memory value is 'cool' is definitely on my list of people most likely to re-invent the radix sort badly.

      It is clearly a slow day for me. I have gone one step further and ported his code over to Linux (only required a change to GetTickCount) and attempted to run it. His "cool" way of using a pointer hack has caused his code to dump core. The problem is his hack being a signed short int pointer.
      Program received signal SIGSEGV, Segmentation fault.
      0x08048752 in BitFast (TailNode=0x804a018, run=0) at bitfast.h:37
                          if (Tail[val]) { //if not 0
      (gdb) p val
      $1 = -22087

      So I try putting unsigned on ptrHack.
      Program received signal SIGSEGV, Segmentation fault.
      0x08048772 in BitFast (TailNode=0x804a018, run=0) at bitfast.h:39
                          tmpN->nextNode = pN;
      (gdb) p tmpN
      $1 = (Node *) 0xffff

      Yeah. Not so sure that casting is what he really wants. I'm not entirely clear how it is his code will work. I really wanted to clean up his mergesort and re-run it to see how valid his claim is, but it looks hopeless. I'll work on it a little more, but since the problem seems to be in his algorithm, it doesn't look real hopeful.

      --
      Education is a better safeguard of liberty than a standing army.
      Edward Everett (1794 - 1865)
    3. Re:radix sort vs. comparison sort by adrianmonk · · Score: 4, Informative

      He has a TRULY AWFUL implementation of merge sort. For example, every time he goes to split a sub-list in half, he does a linear search from the current node to the end the sub-list. Having determined this value he then does ANOTHER LINEAR SEARCH to find the half-way point.

      So he has basically made an O(N^2) time complexity process just to divide the list for the merge sort. This is inside his n*log(n) merge sort. So he has an O(n^3) time complexity mergesort and trumps up how fast his modified radix sort is. Come on! Bubble sort would have beat his mergesort.

      That is maybe a bad implementation, but it actually does not make it O(n^3). In fact, it doesn't make it as bad as even O(n^2). Here's why:

      The first thing I want to establish is that linearly iterating halfway through the sublist a second time (to find the middle of it) has no effect on the asymptotic analysis. It just means going through the sublist 3/2 times, which is still a constant.

      The second thing is this: as you go deeper and deeper into the recursion, the size of the sublist is repeatedly cut in half. Thus, at every level as you go deeper into the recursion you are doing twice as many loops as the previous level (one loop for the first level, when the sublist equals the original list; two loops for the second, when you are breaking halves into quarters; four loops in the third level of recursion; and so on). BUT, although you are doing more loops, the size of the sublist is always half what it was at the previous level of recursion. So have you twice as many loops but half as many loop iterations.

      That means at every level of recursion (think of this size of the call stack if that helps in visualizing it), you are looping over the entire list once (well actually 3/2 times). And there are O(log n) levels of recursion, because every time you recurse, you are working on a list half as big as before.

      The result is that the total extra added time of this unnecessary linear traversal is O(n * log n) for the entire algorithm. Since mergesort is already O(n * log n), the extra looping to find the middle and end of the list has no effect on the asymptotic behavior of the algorithm.

    4. Re:radix sort vs. comparison sort by Wavicle · · Score: 2, Informative

      That was indeed the problem. I did eventually get the thing running and, interestingly, despite using many published mergesort algorithms for sorting linked lists, the bitsort algorithm generally ran a constant 3-4x faster. So it appears that I was wrong about his mergesort having an asymptotic characteristic of O(n^3) and improving the efficiency of his mergesort gave only incremental performance improvement.

      Unfortunately this followup will forever be buried below the mod threshhold but it appears that despite my best efforts, radix sort (bitsort in this case) is superior to mergesort for linked lists. I tested different list sizes, different random ranges and a few other things I could think of, and bitsort always ran faster.

      The only weirdness I could not get rid of is that every time I compiled using gcc -O2 I got segfaults. I did not have time to track it down, I suspect stack smashing due to array usage irregularities.

      --
      Education is a better safeguard of liberty than a standing army.
      Edward Everett (1794 - 1865)
  6. wtf? seriously. by tomstdenis · · Score: 4, Informative

    First, it's just a radix sort [well that's my take]. Second there are too many google ads on that page. Third, merge sort is O(n log n) time, hard to beat for random data. Fourth, most people use variations of the QUICK SORT not merge sort.

    Tom

    --
    Someday, I'll have a real sig.
    1. Re:wtf? seriously. by Anonymous+Brave+Guy · · Score: 3, Informative

      Third, merge sort is O(n log n) time, hard to beat for random data.

      Provably impossible, in fact.

      Fourth, most people use variations of the QUICK SORT not merge sort.

      I'm not sure that's a fair generalisation. There are several good O(n log n) sorting algorithms, heap sort being another obvious one, and typically things like the data structures involved, need for stability, or ease of parallelisation are the deciding factors in practice. In fact some of the best performing algorithms in real terms are theoretically O(n^2) in the worst case, hence the creation of hybrids like introsort, a quicksort variant designed to avoid the worst case by switching to heap sort after a certain recursion depth is reached.

      Anyway, I'm afraid this is another article by a 21-year-old student who is well-meaning, but not nearly experienced enough yet to appreciate that his suggestion is just rehashing (no pun intended) a lot of long-established ideas. I think the giveaway sign is that his CV on the site still lists just about every programming language and web development skill in existence. :-)

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    2. Re:wtf? seriously. by kasperd · · Score: 2, Informative

      Fourth, most people use variations of the QUICK SORT not merge sort.
      Maybe for data that fits in memory. But if the data is on disk and you can't fit all of it in memory, nothing beats multiway mergesort.
      --

      Do you care about the security of your wireless mouse?
    3. Re:wtf? seriously. by Dwedit · · Score: 2, Informative

      Merge sort only requires extra memory to store a list of indexes or pointers, not extra memory for the full data. If your data is integers, then it's the same size. If your data is strings, then the array of indexes is smaller.
      As a plus, merge sort requires no recursion, which may be helpful if your stack is really tiny, like 200 bytes large.

    4. Re:wtf? seriously. by dkf · · Score: 2, Informative

      The kings of the sorting hill are quicksort, heapsort, mergesort, and insertion sort. For real data, quicksort (especially with a good pivot selector sub-algorithm) and heapsort are fastest (heapsort is also very frugal with the stack). Mergesort is a bit slower (but still O(n log n) with reasonable factors) and has the major bonus of being stable, so that elements that are equal to each other are not reordered (and that's very useful indeed with real data). Insertion sort (also stable) is the only O(n2) algorithm worth considering, and then only in a special case: when you've got some sorted data and want to add an item to it, this being a case that comes up surprisingly often. Insertion sort is also really easy to implement correctly.

      Radix sorts are reasonable, but only if the input keys are fairly short. I've never had data like that myself; for 10 character strings, a good quicksort is theoretically better for lists of a million keys or so, which is pretty good. ;-) Also, C programmers should definitely prefer to use quicksort, of course, because they've got an implementation of it in libc...

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
  7. Nice try, b ut... by Ancient_Hacker · · Score: 4, Informative
    I think you may have re-invented the "radix sort".

    It was used by 80-col card sorters, since circa 1925.

    See Knuth, Volume 3

  8. Re:multithreaded merge by TERdON · · Score: 4, Informative

    I have studied a course in parallel algorithms, including localized parallel merge sort (the algorithm you requested). It can be used to subdivide parallelized sorting theoretically unlimitedly. Links: Course homepage, The relevant chapter (PDF) of the course slides, with nice pictures and everything.

    --
    I have a really elegant proof for Fermat's last theorem. If this sig was only a bit longer...
  9. Re:GPL-ed algorithm? by Helios1182 · · Score: 4, Informative

    You can't copyright an algorithm, so there is no license to worry about. This is a well meaning but utterly clueless person who reimplemented a well known algorithm and then made a strange constant-factor instead of asymptotic analysis.

  10. Did KD set this guy up for ridicule on purpose? by grimJester · · Score: 5, Informative

    Ok, I'll start from his site:

    - Programming Skills : Visual Basic ***Excellent***

    Yes, that certainly is... excellent.

    - Message : "Don't ever let school, stop you from learning!"

    School could, learn you some grammar.

    The algorithm is being released under the GPL ( General Public License ). The algorithm belongs to PhoenixBit and VirusFree but you may use/modify it freely.
    *** DO NOT COPY ANYTHING FROM THIS PAGE TO ANY OTHER PAGE. IF YOU WANT SOMEONE TO READ THIS THEN LINK TO THIS PAGE ***


    In addition to trying to apply copyright to an algorithm, doesn't a restriction on copying defeat the purpose of releasing something under the GPL? Or does text in all caps trum previous text not in all caps?

    Feel free to add to this. If there are some clips of this guy with lightsabers, pleas post them.

    1. Re:Did KD set this guy up for ridicule on purpose? by Brandybuck · · Score: 3, Informative

      Geez, were you ever lucky! A ten ton joke just wooshed by inches over the top of your head! You could have been killed!

      --
      Don't blame me, I didn't vote for either of them!
  11. Shock horror, stop the presses by sholden · · Score: 2, Informative

    Retard who thinks you can GPL an algorithm manages to reinvent an algorithm first implemented as a computer program in the 1950s, and used way before then outside of digital computers.

    Also manages to not know about memset and write C code that assumes sizeof(long)==4 and puts non-inline code in header files. I don't dare look at the C++ code since there's a large probability it will cause permanent brain damage.

  12. Be nice by l3mst0r · · Score: 2, Informative

    I invented Radix sort once, too. Who hasn't? Let him have his 15 minutes of fame.

  13. Re:multithreaded merge by CoughDropAddict · · Score: 2, Informative

    I notice that the course homepage talks about designing for an "idealised parallel computer" -- did it take into account the huge speed hit that comes from false sharing on real processors? That should be a major consideration if you are designing shared-memory parallel algorithms, since they can degrade performance by 1700% on modern processors. If you're not careful, your parallel algorithms could end up being much slower than single-threaded ones simply because they force so many cache bounces.

  14. Re:Multiple bugs in the code, wrong measurements ! by slashdot.org · · Score: 2, Informative
    Good points. But, we should give the guy some credit, he does state in the readme:


    This examples are for Proof of Concept only.

    They are not the most error free,optimized,commented,beautifull code
    to have ever been written


    Personally I would never write code like this, proof of concept or not, avoiding the portability problems for example would not have taken more time to write. But pressumably more experience. So probably the best thing to do is be constructive about it.

    I'll give it a shot on some of the issues that I found:

    - don't assume long = 32 bit and short int = 16 bit. Different compilers/platforms will have different sizes. If you need to proof of concept something and don't want to deal with the portability issues, do something like:

    typedef unsigned long u32;
    typedef unsigned short u16;

    And then use u32/u16 throughout. That way, if someone wants to port the code, all they have to do is define the proper typedefs for their platform (I usually put that stuff in it's own header, like "to_port.h" or something).

    Also, it makes it very obvious that you are relying on certain vars to be of a certain size.

    - don't rely on how variables are stored in memory i.e. on what endian-ness the architecture is. If you take a 32 bit unsigned integer, the low 0-7 bits are not stored at the same memory address on different architecture. If you don't understand what I'm talking about, google for little-endian & big-endian. If you don't want to deal with special casing it, do something like #define LITTLE_ENDIAN 1 in your to_port.h, and then in the code do something like

    #ifndef LITTLE_ENDIAN
    #error This code assumes little endian
    #endif
    - this code:

    long Ptrs[65535];
          long Tail[65535];
    stores 512KB on the stack. That's generally considered quite extravagent, to put it mildly.

    - use memset instead of this code:

    for(n=0;n<65535;n++){
            Head[n]=0;
            Tail[n]=0;
        }
    as an example, Microsoft's compiler can inline memset making it extremely fast, compared to a for loop. Btw. the write-through cache on the CPU would probably prefer it if this was broken up into two separate loops, so memory gets accessed linearly, instead of jumping back and forth.
  15. Ripping apart the source by jinxidoru · · Score: 4, Informative

    The previous few posts caused me to also open up the code. Wow! What a ride! I decided that I wanted to add a few things I found as well.

    I loved that he referred to his use of the short pointer as a "cool pointer hack to avoid shifts and ifs!!!" To begin with, if we have to call that a "cool hack" then the requirements for coolness have definitely dropped a bit. I am also confused as to how this helps avoid shifts and ifs. As far as I can tell, it avoids one shift. I'd love to see the code without the "cool hack" because I'm intrigued as to how he uses an if statement to remove the higher bits.

    As stated before, I am also fond of his use of an array of size 65535 rather than 65536. I surprised he didn't run into any segmentation faults. I know that I can give him a test case that will seg fault pretty easily.

    I also question whether he understands the purpose of a header file, as the entire source for his BitSort is contained within a header file. I guess it would sort of make sense if the function was declared inline, but it isn't.

    Another fun element is how non-modular his code is. Everything is hard-coded, even though it would have been easy to declare a few constants or even make the function a template.

    Lastly though, why the crap did this article get through the editors. It is, as has been stated over and over, a radix sort and nothing different. I know that he claims it is different because it is in-place, but that is such an obvious simple change that it does not warrant a whole new algorithm. Regardless of the content, the article is so poorly written with some of the worst grammar I have read in quite some time. Slashdot really needs to improve its article standards.