Slashdot Mirror


Sort Linked Lists 10X Faster Than MergeSort

virusfree tells us about a new algorithm that has been developed that the author claims can sort a linked list up to 10 times faster than MergeSort. "BitFast," a member of the Hash algorithms family, is available in C and C++ under the GPL.

17 of 326 comments (clear)

  1. How is this not a radix sort? by Anonymous Coward · · Score: 5, Informative
  2. It's radix sort. by Nick+Mathewson · · Score: 5, Informative
    From the page:

    4. Logic of BitFast BitFast works with numbers in the bit level. It applies masks to cut the numbers so it uses only the desired ( based on the programmer/problem specifications ) number of bits every run. It always start with the LSB ( Less Significant Bits ) ignoring the any other bits and every run it moves to the higher block of bits until the whole list is sorted. So is we want to sort 8 bit every run , the first timer it will sort 0-7 bits ( Least significant ), then the 8-15 bits , then the 16-23 bits and for last the 24-32 bits ( Most significant ) , having now sorted the entire list of 32bit numbers
    Congratulations, you have reinvented radix sort.
    1. Re:It's radix sort. by vidarh · · Score: 5, Informative

      Making it in place is just an implementation detail, just as most other sorts can be done either in place or not (a typical example is quicksort, where most naive implementations tends to copy, while the typical C implementations tends to be in place).

    2. Re:It's radix sort. by Tim+Browse · · Score: 5, Insightful

      Who do you thinks going to get the credit when they finally invent fission? The person who thought it up or those that actually made it?

      Well, I'm not sure, but I'm pretty sure it's not going to be the guy who made it again 30 years after everyone was already using it.

    3. Re:It's radix sort. by Lon · · Score: 5, Funny

      my sort goes to 11 ... wait, that is only three :(

  3. Re:Radix Sort by Anonymous Coward · · Score: 5, Funny

    How is this different from Radix sort?

    Marketing.

  4. radix sort vs. comparison sort by Peter+Desnoyers · · Score: 5, Informative

    It's a bit of an apples vs. oranges comparison to put this up against mergesort - mergesort is a comparison-based sort, while Papadopoulos' bitfast is a radix sort and thus O(N*W) where N is the number of elements and W is the width of each element in bits. (hint - try sorting 1000-byte strings with bitfast, and see which is fastest) And no, it doesn't have anything to do with hashing.

    However, it's definitely a clever way of implementing radix sort with linked lists, which may make it useful in some cases (e.g. OS internals) where you don't want to allocate space for a big directly-addressable array.

    1. Re:radix sort vs. comparison sort by Wavicle · · Score: 5, Informative

      Actually it is much worse than you say.

      I don't know what possessed me to look at his code, but he really hasn't shown what he claims to have shown. It isn't clear to me that his code is doing what he says, but what IS clear to me is that his comparison to mergesort is very unfair. He has a TRULY AWFUL implementation of merge sort. For example, every time he goes to split a sub-list in half, he does a linear search from the current node to the end the sub-list. Having determined this value he then does ANOTHER LINEAR SEARCH to find the half-way point.

      So he has basically made an O(N^2) time complexity process just to divide the list for the merge sort. This is inside his n*log(n) merge sort. So he has an O(n^3) time complexity mergesort and trumps up how fast his modified radix sort is. Come on! Bubble sort would have beat his mergesort.

      His cleverness gets the better of him when it comes to his modified radix sort. For example, he creates two arrays (on the stack) of 65535 elements; apparently unaware that this creates an array with indexes 0..65534.

      --
      Education is a better safeguard of liberty than a standing army.
      Edward Everett (1794 - 1865)
    2. Re:radix sort vs. comparison sort by Tim+Browse · · Score: 5, Insightful

      I don't know what possessed me to look at his code

      Damn you! You made me look at his code! The goggles, they do nothing.

      His cleverness gets the better of him when it comes to his modified radix sort. For example, he creates two arrays (on the stack) of 65535 elements; apparently unaware that this creates an array with indexes 0..65534.

      I can't help feeling he should have declared Ptrs and Tails as Node* arrays, and bypassed all that random casting to longs. Not sure what's going on there. But then anyone who thinks the roundabout way he used of accessing the top 16 bits of a 32-bit memory value is 'cool' is definitely on my list of people most likely to re-invent the radix sort badly.

      At first, I thought it wasn't a stable sort, but looking further into it, that's because he mixed up the 'head' and 'tail' descriptions in the explanation (or possibly in the code).

      It is amusing that some /. posters think that doing this in-place is somehow an amazing leap of insight. Sometimes /. is like reading thedailywtf.com - you see something dumb as the main story, and then find half the people commenting on it have even less of a clue.

      I must be new here.

  5. Re:FROTHY PISS by Anonymous Coward · · Score: 5, Funny

    No.
    You are an "HTML programmer".
    An oxymoron and a moron, simultaneously.
    Defenestrate yourself.

  6. That doesn't matter! by Anonymous Coward · · Score: 5, Funny

    This was written by VirusFree! Who wouldn't want to use something by a guy named VirusFree? It's a million times better than something by FreeVirus.

  7. Re:wtf? seriously. by phazer · · Score: 5, Insightful

    Fifth, the author tries to "GPL" the algorithm, which is utter nonsense. GPL deals with copyright, so the most he can do is GPL his implementation of his bucket-/radix sort. Anyone is free to re-implement the algorithm, GPL or not.

    It would require a software patent to restrict the use of the algorithm to GPL programs.

    (And sixth, a quick look in a text book would have clued the author in)

  8. You're making the baby Knuth cry by aprosumer.slashdot · · Score: 5, Funny

    You're making the baby Knuth cry. Please read "The Art of Computer Programming" before proceeding any further.

  9. Re:Nice, but... by dlthomas · · Score: 5, Informative

    I guess it technically *is* O(N * log(N)) to the number of items, but this is misleading. It's actually O(N).

    As pretty much everyone has pointed out, it's just radix sort. The time taken by radix sort scales linearly to the number of keys, and with the log of the maximum key that can be held by the container.

    If we're dealing entirely with unique keys, this is of course >= log(N), by the pigeonhole principle and all that. If we may have duplicate keys, however, there may be more keys than container space, and radix *does* scale linearly to the keys.

    Time to prepare and overhead of this alrogrithm are negligable in any context large enough for us to care about the O() of the algorithms. For some things, this *is* better than mergesort. I'm just not sure why it was posted, as it's also not new - it was invented over 50 years ago.

  10. Did KD set this guy up for ridicule on purpose? by grimJester · · Score: 5, Informative

    Ok, I'll start from his site:

    - Programming Skills : Visual Basic ***Excellent***

    Yes, that certainly is... excellent.

    - Message : "Don't ever let school, stop you from learning!"

    School could, learn you some grammar.

    The algorithm is being released under the GPL ( General Public License ). The algorithm belongs to PhoenixBit and VirusFree but you may use/modify it freely.
    *** DO NOT COPY ANYTHING FROM THIS PAGE TO ANY OTHER PAGE. IF YOU WANT SOMEONE TO READ THIS THEN LINK TO THIS PAGE ***


    In addition to trying to apply copyright to an algorithm, doesn't a restriction on copying defeat the purpose of releasing something under the GPL? Or does text in all caps trum previous text not in all caps?

    Feel free to add to this. If there are some clips of this guy with lightsabers, pleas post them.

  11. Re:FROTHY PISS by VENONA · · Score: 5, Insightful

    Explanation: pretty much bogus.

    As for the quality of your post: let me guess--you also complain about slow, bloated software, right? The old, "Intel giveth, Microsoft taketh away," adage? Users have several orders of magnitude more compute and storage power than 'back in whatever day' yet personal computers seem little more responsive, etc.

    Don't feel lonely. There's a large population of lugnuts like you, who, if they think of CS at all, largely carp about how some CS departments haven't become current technology de jour tradeschools. Some, unfortunately, have, but that's a whole different discussion, which has been seen on Slashdot time and again.

    Algorithm research is important, as is having at least something of a grasp of algorithms. In *your* next programming exercise, since you seem to regard sort efficiency as 'esoteric', feel free to reinvent the bubble sort. Also, tout it as the Next Great Thing, and submit patches against all your favorite apps. That will get you your twenty seconds of fame, I assure you.

    Sometimes I love Slashdot--but then I read a post from some random AC idiot like you: the proverbial lowest common denominator. Maybe you should restrain your efforts toward what you seem to regard as cool snarky posts, watch a thread (about which you plainly know nothing) develop, and maybe freaking *learn* something.

    OTOH, maybe I'm just bitter right now, because I've just been doing a search through Google news on climate change, and I'm pretty much convinced that the last thing the human race needs right now is people like you.

    --
    What you do with a computer does not constitute the whole of computing.
  12. Multiple bugs in the code, wrong measurements ! by wtarreau · · Score: 5, Interesting

    Don't use this code !

    First, there are multiple bugs in the code itself :

    1) everywhere, it is assumed that 2^16 = 65535 and not 65536. Arrays are allocated for this size too, but are used with 65536 entries. (un)fortunately for him, the space is probably allocated anyway so the program does not crash on this.

    2) a similar bug in the random number generator carefully avoids dangerous values, because he generates numbers between min and max by doing a modulo on max-min , which prevents max from being produced. This is not a
    big deal anyway, just that it hides the bug.

    3) he assumes everywhere that "long" is unsigned and "short int" too. This makes the code SegFault on Linux with negative array indexes. The fix is obvious, though.

    And most of all : times are measured in optimal conditions. Indeed, this algorithm is
    just a radix sort with a large radix (2^16). A radix algorithm is efficient when the number
    of elements is high compared to the radix, because at every level, all radix entries have to
    be scanned for existing values. In his example, he sorts 4 million elements. His code is faster
    than the merge sort on 4 million entries, just as would be any radix sort. But on smaller sets,
    it still takes the same time per level, because it has to scan all 65536 entries even if there
    are only 100 numbers. The fact is that for less than 10000 values, the mergesort becomes faster in his own code.

    When measuring algorithms complexity, we use the O(x) notation. The time per iteration is never
    considered. When the time matters (as in his example), we should consider the time per iteration (T), and the total time (t) will follow :

          t = T . complexity

    In complexity, his algorithm is O(N) on 16 bits, but the time per iteration is very high. Other sorting algorithms are O(N.log(N)). So the total time is :

          t1 = T1 . O(N) for his algorithm, and :
          t2 = T2 . O(N.log(N)) for the mergesort.

    But when T1 > T2.log(N), he looses, which is almost always because log(N) is small for 16 bits, and T2 too because operations are very simple. In his case, T1 is very high because it represents the time required to scan 65536 values.

    It's amazing that he fell in such a beginner's trap. I suppose that he's still a student, which would explain the total lack of portability and newbie's bugs in the code.

    Well tried anyway :-)

    Willy