Sort Linked Lists 10X Faster Than MergeSort

How is this not a radix sort? by Anonymous Coward · 2007-02-25 07:49 · Score: 5, Informative

http://en.wikipedia.org/wiki/Radix_sort

Re:How is this not a radix sort? by truthsearch · 2007-02-25 08:10 · Score: 1

Marketing. BitFast sounds faster... with bits.

--
Developers: We can use your help.
Re:How is this not a radix sort? by Anthony · 2007-02-25 08:12 · Score: 1

I thought his explanation was a bit terse and "hand-wavy". It seems it was actually a description of his research methods.

--
Slashdot: Where nerds gather to pool their ignorance
Re:How is this not a radix sort? by Anonymous Coward · 2007-02-25 08:18 · Score: 0

http://en.wikipedia.org/wiki/Radix_sort
Could someone volunteer to improve the writing in that article? At least for that atrocious first paragraph?
Re:How is this not a radix sort? by Sillygates · 2007-02-25 09:11 · Score: 4, Funny

My favorite is still: http://www.dangermouse.net/esoteric/intelligentdes ignsort.html

Sorts in O(1) and uses no memory!

--
I fear the Y2038 bug
Re:How is this not a radix sort? by acidrain · 2007-02-25 10:55 · Score: 4, Informative

This algorithm is exactly how my first year comp-sci teacher taught radix sort. I'm guessing "VirusFree" stole his lecture notes.
It's actually much cheaper to use arrays of indices or arrays of pointers internally instead of linked list operations, but "BitFast" is a good naive implementation if you don't wasn't to confuse first year comp-sci students who only just learned about linked lists.
Here is a good paper of radix sort optimization, and it covers using radix sort with floating point.

--
-- http://thegirlorthecar.com funny dating game for guys
Re:How is this not a radix sort? by andreyw · 2007-02-25 11:01 · Score: 1

Ya. plus "VirusFree" can go fuck himself. You can't copyright an algorithm. Only it's implementation.
Re:How is this not a radix sort? by Tatarize · 2007-02-25 13:27 · Score: 0, Flamebait

My question is where the hell does this guy think that 10X faster is, first, impressive, and secondly worth a damn? Has he never heard of time complexity? Merge sort is order O(N(logN)) ten times faster than that and you get O(N(logN)) that's a speed boost if I ever saw one. Finally, this is really a radix sort, which is O(N) because it it is bitwise rather than comparison. That's not ten times faster that is log(N) times faster.

This isn't amazing, I wrote a radix sort back in college. It's an impressive algorithm. This is just that but crappier. Why not write an even faster integer sorting program.

void supermegasort(vector <int> &tosort) {
assert(!farts.stink());
vector <int> everynumber(1<<(sizeof(int)-1),0);
for (int i=0;i < tosort.size();i++) everynumber[tosort[i]]++;
for (int i=0,v=0;i < everynumber.size();i++) for(int q=0;q<everynumber[i];q++)tosort[v++]=i;
}

Yeah, negative numbers would be rather... fun. But, note... I asserted that my farts don't stink. So you can't be critical of that great code there.

--

It is no longer uncommon to be uncommon.
Re:How is this not a radix sort? by mgiuca · 2007-02-25 17:43 · Score: 1

Wow that's genius!

I was a big fan of dangermouse's esoteric languages some years ago:
http://www.dangermouse.net/esoteric/

I didn't realise he'd been developing these esoteric algorithms :)
Re:How is this not a radix sort? by Anonymous Coward · 2007-02-25 18:12 · Score: 0

oye! this is retarded ... who published this on slashdot :P

the no. of ads on there site and the amount of hits they must have got would have surely made them rich.
Re:How is this not a radix sort? by Dirtside · 2007-02-25 19:27 · Score: 3, Funny

Pfft, everyone knows the best sort is random sort. Randomize the list. Is it sorted? If not, randomize it again! You'll hit it eventually!

--
"Destroy science and religion. Science would re-emerge exactly the same; but not religion." - Penn Jillette, paraphrased
Re:How is this not a radix sort? by macintostie · 2007-02-25 22:04 · Score: 1

I fail to see it is cheaper to use arrays of {indices,pointers}. With radix sort the algorithms sequentially goes through the linked list, so the costly linked list operations you are referring to are no issue. With comparison sort algorithms (quicksort, ...) you are right (many cmp's, many fwd-backward in de list). With radix sort it is actually smart to use a linked list, so you don't have to prepare for the worst-case scenario when only one bucket contains all the data (note, 2^16 possibilities as 16 bits of the 32 bits are the same) (to take the worst case into account, each buckets needs to be 2^16 bits large, but with 2^16 bits buckets, all the memory is used on a 32-bits machine).
Re:How is this not a radix sort? by julesh · 2007-02-25 23:48 · Score: 1

Strangely enough, when I RTFA to try to compare them, all I see is that exact same link on the page.
Re:How is this not a radix sort? by StarvingSE · 2007-02-25 23:59 · Score: 1

What do you call your algorithm, NP-sort?

--
I got nothin'
Re:How is this not a radix sort? by Anonymous Coward · 2007-02-26 01:08 · Score: 0

Actually, it's O(0), and uses no memory.

Luv,

The Sorter

x ,--|>o--.
x '-------'
Re:How is this not a radix sort? by hackstraw · 2007-02-26 01:13 · Score: 2, Interesting

Funny, all I saw on the "site" was a donation link or something. Looked like a parked domain or something.

But radix sorting is cool. What is very relevant to it in 2007 is that it scales superlinearly with multiple cores/CPUs which is important since most computers today have hardware that execute multiple threads simultaniously.
Re:How is this not a radix sort? by Anonymous Coward · 2007-02-26 01:22 · Score: 0

Only me, (apol. "Loadsamoney" Enfield)

See also this post for the C ode I posted a while ago....

http://groups.google.co.uk/group/alt.test.d/browse _thread/thread/a7474fff8b9e4738/daa319a1a385fedf?l nk=st&q=P%3DNP+alt+test&rnum=16#daa319a1a385fedf

Love again,

The Sorter

2^0.=0.
Re:How is this not a radix sort? by kalirion · 2007-02-26 03:05 · Score: 1

The best part is that depending on the list size and the pseudo-rng, you could get stuck in a pattern where the list NEVER gets sorted!
Re:How is this not a radix sort? by Bob-taro · 2007-02-26 03:34 · Score: 1

So I guess an equivalent "evolution sort" joke would be: if you wait millions of years, natural processes will cause the list to sort itself.

--
Prov 9:8 Do not rebuke mockers or they will hate you; rebuke the wise and they will love you.
Re:How is this not a radix sort? by try_anything · 2007-02-26 05:21 · Score: 1

My question is where the hell does this guy think that 10X faster is, first, impressive, and secondly worth a damn? Has he never heard of time complexity?

Whether I walk, ride my bike, or drive my car, I cover a distance that is roughly linear with time. Are all of these methods are roughly equivalent? Join the real world. I care whether my build finishes in twenty seconds or two hundred seconds. I care whether my text editor takes 200 milliseconds or 2 seconds to respond to a command.
Of course performance differences of this kind depend heavily on architecture and implementation, so it's suspicious to see an announcement without any architectural details.
Re:How is this not a radix sort? by Intron · 2007-02-26 05:33 · Score: 1

I tried to write an evolution sort, but my code crashed due to a missing link.

--
Intron: the portion of DNA which expresses nothing useful.
Re:How is this not a radix sort? by ravenlock · 2007-02-26 05:41 · Score: 1

I tried to write an Evolution sort, but as usual, Evolution crashed on me :(
Re:How is this not a radix sort? by IpalindromeI · 2007-02-26 07:34 · Score: 1

No, bogosort.

--

--
Promoting critical thinking since 1994.
Re:How is this not a radix sort? by jvkjvk · 2007-02-26 07:37 · Score: 1

So, you have invented serial QC! Congratulations!

Or one could formulate the problem such that the result of the random sort is the solution...
Re:How is this not a radix sort? by iapetus · 2007-02-26 11:22 · Score: 1

Which is why you should use a real random number generator rather than a pseudo-random one for this sort algorithm.

--
++ Say to Elrond "Hello.".
Elrond says "No.". Elrond gives you some lunch.
Re:How is this not a radix sort? by Tatarize · 2007-02-28 14:48 · Score: 1

Well, what if every year the speed of any of those modes of transport doubled in speed. How much would it matter when the time taken for any of the modes is about 1 second? Don't you think it would be an actual break through if rather than get there in linear time, you got there in say constant time? That would be an actual marked improvement. It would take a wormhole in this analogy, but never the less. That would be the a shift in time complexity. If you could travel at the speed of light or roughly jogging speed it wouldn't much matter if compared to instantaneous travel.

Even though ion propulsion is really kinda slow, if you're going far enough it will easily beat any rocket ship. Just sits there accelerating slowly for a long long time. Which is a marked improvement on a linear time mode of transit. Even in the real world, time complexity matters. Don't get me wrong, I did redesign an algorithm the other day to make it twice as fast because it was in constant use. Twice as fast is still twice as fast. In the computer world it is worth spending a couple hours to double the speed of somethings rather than waiting 18 months.

The real kicker is this is just a radix sort, which is a time complexity break through compared with merge sort. A merge sort is O(log(N)N), whereas Radix sort is O(N). He could simply sort more units. He might as well say his algorithm is 1,000,000X faster. This would be trivial to 'prove'. Just sort a trillion integers.

--

It is no longer uncommon to be uncommon.
Re:How is this not a radix sort? by try_anything · 2007-02-28 19:13 · Score: 1

Given that datasets and user expectations are growing along with speed, you should imagine an expanding universe in which I can walk 10,000 miles per hour, but I have to go 5,000 miles to get to the grocery store and buy milk :-P

It's radix sort. by Nick+Mathewson · 2007-02-25 07:49 · Score: 5, Informative

From the page:

4. Logic of BitFast BitFast works with numbers in the bit level. It applies masks to cut the numbers so it uses only the desired ( based on the programmer/problem specifications ) number of bits every run. It always start with the LSB ( Less Significant Bits ) ignoring the any other bits and every run it moves to the higher block of bits until the whole list is sorted. So is we want to sort 8 bit every run , the first timer it will sort 0-7 bits ( Least significant ), then the 8-15 bits , then the 16-23 bits and for last the 24-32 bits ( Most significant ) , having now sorted the entire list of 32bit numbers

Congratulations, you have reinvented radix sort.

Re:It's radix sort. by abradsn · 2007-02-25 08:00 · Score: 4, Insightful

His algorythm is "in place"

radix is not

I agree though that the concepts for both algorithms are very similar.
Re:It's radix sort. by vidarh · 2007-02-25 08:10 · Score: 5, Informative

Making it in place is just an implementation detail, just as most other sorts can be done either in place or not (a typical example is quicksort, where most naive implementations tends to copy, while the typical C implementations tends to be in place).
Re:It's radix sort. by lordsid · 2007-02-25 08:53 · Score: 1

Who do you thinks going to get the credit when they finally invent fission? The person who thought it up or those that actually made it?

--
IMAGE VERIFICATION IS EVIL!
Re:It's radix sort. by Anonymous Coward · 2007-02-25 09:06 · Score: 1, Funny

Who do you thinks going to get the credit when they finally invent fission? The person who thought it up or those that actually made it?
The person who patented it, of course. :P
Re:It's radix sort. by rosscoe · 2007-02-25 09:40 · Score: 4, Funny

Wow, he uses the new 33bit ints as well, enabling numbers twice as high as the so-old-school 32bit ints and making it just that 'little bit' better.
Re:It's radix sort. by Tim+Browse · 2007-02-25 09:41 · Score: 5, Insightful

Who do you thinks going to get the credit when they finally invent fission? The person who thought it up or those that actually made it?
Well, I'm not sure, but I'm pretty sure it's not going to be the guy who made it again 30 years after everyone was already using it.
Re:It's radix sort. by Bozzio · 2007-02-25 09:55 · Score: 0, Redundant

I wish I had mod points!

+1 Funny

--
I just pooped your party.
Re:It's radix sort. by Lon · 2007-02-25 10:08 · Score: 5, Funny

my sort goes to 11 ... wait, that is only three :(
Re:It's radix sort. by xant · 2007-02-25 11:12 · Score: 1

*spit take*

--
It's rare that you're presented with a knob whose only two positions are Make History and Flee Your Glorious Destiny.
Re:It's radix sort. by seaturnip · 2007-02-25 11:17 · Score: 1

Uhhh you realize fission was first made to work in the 1940s in the Manhattan Project? You are thinking of fusion. (... which also already works, it's just not outputting more energy than it takes in.)
Re:It's radix sort. by AnalogFile · 2007-02-25 11:21 · Score: 0

Who do you thinks going to get the credit when they finally invent fission? The person who thought it up or those that actually made it?
Fission is not an invention. It is well known and has been for several years. It's the nuclear reaction that forms the basis of A bombs and nuclear power plants. Early studies are credited to Enrico Fermi

In case you actually ment "fission", that also is not an invention. And it also has been known for some time. It is the nuclear reaction that makes stars visible and that is used in the H bombs.

We also know that cold fission is possible (while cold fusion is not). It has been done for 20+ years. Both fusion and cold fission are controllable and eventually usable for something different than bombs. Fusion is used in nuclear power plants. Cold fission is not because very fiew of the currently possible tecnologies for controlled cold fission are energy positive (that is, most of them actually consume energy instead of producing it). One very interesting research, that started in Canada, had several years of development in Italy and is now continuing in Japan (I think Mitsubishi is doing it, but am not sure) is in the use of controlled cold fusion to destroy the radioactive nuclear wastes produced in fusion reactors.
Re:It's radix sort. by Anonymous Coward · 2007-02-25 13:26 · Score: 0

I think lordsid doesn't quite understand the meaning of "in place" here. It doesn't mean "in place" as in "implemented". "In place" means that the elements to be sorted aren't copied around on the heap but instead are "pointed" to by the now-sorted list of "pointers."

"Anonymous"
Re:It's radix sort. by lordsid · 2007-02-25 14:17 · Score: 1

Eh you got me, wrong on two parts. Teach me to stick my nose where I don't know the subject. I thought it was a good quip though.

--
IMAGE VERIFICATION IS EVIL!
Re:It's radix sort. by pz · 2007-02-25 14:42 · Score: 1

Who do you thinks going to get the credit when they finally invent fission? The person who thought it up or those that actually made it?

I'm not sure you've written what you meant to here.

First, fission, that is, the breaking of atomic nuclei into separate clusters of particles with a concomitant release of energy, has been around on Earth for billions of years, as it's a natural phenomenon (see, eg, http://www.ocrwm.doe.gov/factsheets/doeymp0010.sht ml). Doing it in a controlled way to extract energy or explosive power, well the idea is something like 100 years old, and it was brought to fruition about 60 years ago. We remember the names of both sets of people (eg, Einstein, Bohr, Oppenhiemer, Fermi, Rutherford, Teller, etc.).

Second, you probably mean fusion. That, too, is really old, as it's a natural phenomenon. Powers the Sun, you see. Now, perhaps you meant a controlled fusion reaction in order to extract energy, or explosive power. The latter has already been done, and we remember both sets of names (see the list above). Perhaps, you really meant controlled fusion reactions to extract electrical power. That's still being worked on, but saying that it will be, "finally invented," is, well, an odd turn of phrase. It *has* been invented. The ideas (see that same list above) are all pretty old. The devil, it turns out, is in the details, and the details are really, really hard. So hard and so expensive that the current best hope, ITER, requires cooperation from seven countries (if you lump all of the EU into one country). We don't know who will be running the show when ITER, or some successor, finally creates a sustainable energy-positive burn, but I hope we remember their name as well as we remember the names above.

--

Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
Re:It's radix sort. by bob.appleyard · 2007-02-25 14:59 · Score: 1

All rather confused. Swap "fusion" and "fission" in the parent, except for a few cases at the end.

--
How dare you be so modest!! You conceited bastard!!
Re:It's radix sort. by Anonymous Coward · 2007-02-25 19:47 · Score: 0

you realize fission was first made to work in the 1940s in the Manhattan Project?
No, it was discovered in the late 1930s in Germany, which earned Otto Hahn the nobel prize in chemistry in 1944 (http://nobelprize.org/nobel_prizes/chemistry/laur eates/1944/index.html).
Re:It's radix sort. by naoursla · 2007-02-25 23:10 · Score: 1

Hasn't fission already been invented? Heck, I am pretty sure that even *controlled* fission has already been invented.

I think Albert Einstein might have had something to do with thinking up fission. I do not know who is responsible for the first nuclear reactor.

So, I guess in answer to your question: the person who thought it up.
Re:It's radix sort. by twoshortplanks · 2007-02-26 01:42 · Score: 1

The first nuclear reactor undergoing a chain reaction was built by Enrico Fermi (et. al.), and was completed on 2nd December 1942. The idea of nuclear fission was proposed by Otto Robert Frisch and Lise Meitner rather than by Einstein.

--
-- Sorry, I can't think of anything funny to say here.
Re:It's radix sort. by Anonymous Coward · 2007-02-26 06:52 · Score: 0

Who do you thinks going to get the credit when they finally invent fission? The person who thought it up or those that actually made it? I'm going to go with Leo Szilard. He invented the fission reactor back in 1955 http://inventors.about.com/library/inventors/blnuc learfission.htm
Now as for who's going to get credit for inventing FUSION? I'm not sure but probably the first person that comes up with a successful implementation.
Re:It's radix sort. by drxenos · 2007-02-26 06:53 · Score: 1

That's not necessarily true. Sorting algorithms can and usually are categorized by whether they are in-place or not (along with their complexity and stability).

--

Anonymous Cowards suck.
Re:It's radix sort. by AnalogFile · 2007-02-26 11:41 · Score: 1

ARGH! Yes! I got it totally mixed up.
Here's take 2

Fission is not an invention. It is well known and has been for several years. It's the nuclear reaction that forms the basis of A bombs and nuclear power plants. Early studies are credited to Enrico Fermi

In case you actually ment "fusion", that also is not an invention. And it also has been known for some time. It is the nuclear reaction that makes stars visible and that is used in the H bombs.

We also know that cold fusion is possible (while cold fission is not). It has been done for 20+ years. Both fission and cold fusion are controllable and eventually usable for something different than bombs. Fission is used in nuclear power plants. Cold fusion is not because very fiew of the currently possible tecnologies for controlled cold fusion are energy positive (that is, most of them actually consume energy instead of producing it). One very interesting research, that started in Canada, had several years of development in Italy and is now continuing in Japan (I think Mitsubishi is doing it, but am not sure) is in the use of controlled cold fusion to destroy the radioactive nuclear wastes produced in fission reactors commonly in use to produce steam and, from that, electricity.

I hope this now makes sense. Sorry for the confusion.
Re:It's radix sort. by Anonymous Coward · 2007-02-28 10:53 · Score: 0

Fission?

Umm, I hate to break it to you, but they did that already... Enrico Fermi and friends.

There was even a very practical demonstration project.

It was called the Manhattan Project, you may have heard of it.

Please feel free to claim credit for inventing nuclear fission, since it's been awhile this should be quite novel.

While you're at it, nobody has taken credit for the law of gravity in awhile, either.

If you can tear yourself away from inventing perpetual motion engines long enough, that is.

I am firmly convinced that human stupidity is the single most relevant causative force known to man, and I am in the process of gathering sufficient empirical evidence to prove that theory. Thank you for participating.
Re:It's radix sort. by Anonymous Coward · 2007-02-28 11:03 · Score: 0

I am firmly convinced that human stupidity is the single most relevant causative force known to man, and I am in the process of gathering sufficient empirical evidence to prove that theory. Thank you for participating. Followed by human arrogance as a close second.

Nice, but... by i+kan+reed · 2007-02-25 07:49 · Score: 1, Informative

It's still in the N log(N) time class. Mergesort has minimal time overhead to prepare and only N memory overhead. Do we have that for this algorithm?

Re:Nice, but... by Anonymous Coward · 2007-02-25 08:05 · Score: 0

Radix sort exploits the representation of the keys to be compared. It is not general purpose and its running time cannot be measured purely as a function of 'N' like for comparison-based sorts.
Re:Nice, but... by Anonymous Coward · 2007-02-25 08:08 · Score: 0

Mergesort on a linked list requires constant space.
Re:Nice, but... by i+kan+reed · 2007-02-25 08:25 · Score: 1

Mergesort on a linked list requires constant space.
You're forgetting the recursive stack overhead, which is O(LOG2(N)). Still, you're right, that's not N.
Re:Nice, but... by dlthomas · 2007-02-25 08:30 · Score: 5, Informative

I guess it technically *is* O(N * log(N)) to the number of items, but this is misleading. It's actually O(N).

As pretty much everyone has pointed out, it's just radix sort. The time taken by radix sort scales linearly to the number of keys, and with the log of the maximum key that can be held by the container.

If we're dealing entirely with unique keys, this is of course >= log(N), by the pigeonhole principle and all that. If we may have duplicate keys, however, there may be more keys than container space, and radix *does* scale linearly to the keys.

Time to prepare and overhead of this alrogrithm are negligable in any context large enough for us to care about the O() of the algorithms. For some things, this *is* better than mergesort. I'm just not sure why it was posted, as it's also not new - it was invented over 50 years ago.
Re:Nice, but... by quanticle · 2007-02-25 08:52 · Score: 1

Nope. This is a reinvention of Radix Sort, which is O(n).

--
We all know what to do, but we don't know how to get re-elected once we have done it
Re:Nice, but... by Xaroth · 2007-02-25 09:58 · Score: 1

I'm just not sure why it was posted, as it's also not new - it was invented over 50 years ago.

That's /. for ya.

--
That green slime had it coming.
Re:Nice, but... by ioshhdflwuegfh · 2007-02-25 23:23 · Score: 1

Congratulations on the interesting post.
If we're dealing entirely with unique keys, this is of course >= log(N), by the pigeonhole principle and all that. If we may have duplicate keys, however, there may be more keys than container space, and radix *does* scale linearly to the keys.
What is then the sorting time of the Radix algorithm? I mean, mathematically it cannot be just O(k*N) where k is the number of keys, because k depends on N in general. Now, if all the keys are different (which is for many applications the most important case), what is the sorting time in terms of N? As far as I know, Quicksort's typical sorting times are much better than those of the Radix.
Re:Nice, but... by dlthomas · 2007-02-26 04:30 · Score: 1

With unique keys, it's probably fair to call it O(N * log(N)).

Radix Sort by doopokko · 2007-02-25 07:51 · Score: 0, Redundant

How is this different from Radix sort?

Re:Radix Sort by Anonymous Coward · 2007-02-25 07:53 · Score: 5, Funny

How is this different from Radix sort?

Marketing.
Re:Radix Sort by pedantic+bore · 2007-02-25 11:05 · Score: 1

Where are mod points when I need them? If I had any, they'd be yours.

--
Am I part of the core demographic for Swedish Fish?

Almost accurate. by imbaczek · 2007-02-25 07:52 · Score: 0, Redundant

This isn't sorting linked lists, it's sorting linked list of NUMBERS (or, more specifically, N-bit strings.) Doing that fast is hardly rocket science, and this bitfast is nothing new... That's basically radixsort with radix = 2^16.

Only lists of integers by iamacat · 2007-02-25 07:54 · Score: 2, Informative

Nice, but you could just copy the list to array and do counting/radix sort. Does nothing for strings, floating point values, structures...

Re:Only lists of integers by antifoidulus · 2007-02-25 08:23 · Score: 3, Interesting

Not only integers, more than likely only POSITIVE integers. If you are using 2's complement that would mean that the first bit in all negative numbers would be a 1. Unless I missed something, this algorithm does not take that into account and would sort them as being larger than all positive numbers.

--
Monstar L
Re:Only lists of integers by StikyPad · 2007-02-25 09:49 · Score: 2, Funny

Does nothing for strings

ahhiiinnoooprrssssttt

--
https://www.eff.org/https-everywhere
Re:Only lists of integers by Anonymous Coward · 2007-02-25 10:00 · Score: 0

So? Cut the list when you're done at the maximum positive value--take everything before this pivot point and add it to the end. Done.

Welcome to the wonders of two's complement.
Re:Only lists of integers by Anonymous Coward · 2007-02-25 10:15 · Score: 0

Does nothing for strings

ahhiiinnoooprrssssttt Iran rots piss on hot shit?
Re:Only lists of integers by Jorrit · 2007-02-26 02:58 · Score: 1

Even easier. Just add 1^31 to the numbers before comparing the bits so it is positive.

Greetings,

--
Project Manager of Crystal Space (http://www.crystalspace3d.org). Support CS at http://tinyurl.com/cb3x4

radix sort vs. comparison sort by Peter+Desnoyers · 2007-02-25 07:55 · Score: 5, Informative

It's a bit of an apples vs. oranges comparison to put this up against mergesort - mergesort is a comparison-based sort, while Papadopoulos' bitfast is a radix sort and thus O(N*W) where N is the number of elements and W is the width of each element in bits. (hint - try sorting 1000-byte strings with bitfast, and see which is fastest) And no, it doesn't have anything to do with hashing.

However, it's definitely a clever way of implementing radix sort with linked lists, which may make it useful in some cases (e.g. OS internals) where you don't want to allocate space for a big directly-addressable array.

Re:radix sort vs. comparison sort by Ignorant+Aardvark · 2007-02-25 08:20 · Score: 3, Interesting

I ditto this. Mergesort is a true comparison sort, which means it can sort anything so long as a comparison between two values is defined, e.g. strings (in lexigraphical order), floating point numbers, 1000-digit numbers ... you get the drift. BitFast is a radix sort and only works in cases where the list elements obey certain limitations: for instance, if each list element can be expressed in 32 bits (an int).

--
Cyde Weys Musings - Scrutinizing the inscrutable
Re:radix sort vs. comparison sort by Wavicle · 2007-02-25 09:30 · Score: 5, Informative

Actually it is much worse than you say.

I don't know what possessed me to look at his code, but he really hasn't shown what he claims to have shown. It isn't clear to me that his code is doing what he says, but what IS clear to me is that his comparison to mergesort is very unfair. He has a TRULY AWFUL implementation of merge sort. For example, every time he goes to split a sub-list in half, he does a linear search from the current node to the end the sub-list. Having determined this value he then does ANOTHER LINEAR SEARCH to find the half-way point.

So he has basically made an O(N^2) time complexity process just to divide the list for the merge sort. This is inside his n*log(n) merge sort. So he has an O(n^3) time complexity mergesort and trumps up how fast his modified radix sort is. Come on! Bubble sort would have beat his mergesort.

His cleverness gets the better of him when it comes to his modified radix sort. For example, he creates two arrays (on the stack) of 65535 elements; apparently unaware that this creates an array with indexes 0..65534.

--
Education is a better safeguard of liberty than a standing army.
Edward Everett (1794 - 1865)
Re:radix sort vs. comparison sort by Tim+Browse · 2007-02-25 10:30 · Score: 5, Insightful

I don't know what possessed me to look at his code
Damn you! You made me look at his code! The goggles, they do nothing.
His cleverness gets the better of him when it comes to his modified radix sort. For example, he creates two arrays (on the stack) of 65535 elements; apparently unaware that this creates an array with indexes 0..65534.
I can't help feeling he should have declared Ptrs and Tails as Node* arrays, and bypassed all that random casting to longs. Not sure what's going on there. But then anyone who thinks the roundabout way he used of accessing the top 16 bits of a 32-bit memory value is 'cool' is definitely on my list of people most likely to re-invent the radix sort badly.
At first, I thought it wasn't a stable sort, but looking further into it, that's because he mixed up the 'head' and 'tail' descriptions in the explanation (or possibly in the code).
It is amusing that some /. posters think that doing this in-place is somehow an amazing leap of insight. Sometimes /. is like reading thedailywtf.com - you see something dumb as the main story, and then find half the people commenting on it have even less of a clue.
I must be new here.
Re:radix sort vs. comparison sort by Wavicle · 2007-02-25 11:40 · Score: 4, Informative

But then anyone who thinks the roundabout way he used of accessing the top 16 bits of a 32-bit memory value is 'cool' is definitely on my list of people most likely to re-invent the radix sort badly.

It is clearly a slow day for me. I have gone one step further and ported his code over to Linux (only required a change to GetTickCount) and attempted to run it. His "cool" way of using a pointer hack has caused his code to dump core. The problem is his hack being a signed short int pointer.Program received signal SIGSEGV, Segmentation fault. 0x08048752 in BitFast (TailNode=0x804a018, run=0) at bitfast.h:37 if (Tail[val]) { //if not 0 (gdb) p val $1 = -22087
So I try putting unsigned on ptrHack.Program received signal SIGSEGV, Segmentation fault. 0x08048772 in BitFast (TailNode=0x804a018, run=0) at bitfast.h:39 tmpN->nextNode = pN; (gdb) p tmpN $1 = (Node *) 0xffff
Yeah. Not so sure that casting is what he really wants. I'm not entirely clear how it is his code will work. I really wanted to clean up his mergesort and re-run it to see how valid his claim is, but it looks hopeless. I'll work on it a little more, but since the problem seems to be in his algorithm, it doesn't look real hopeful.

--
Education is a better safeguard of liberty than a standing army.
Edward Everett (1794 - 1865)
Re:radix sort vs. comparison sort by ageforce_ · 2007-02-25 13:13 · Score: 1

I can't reproduce your O(N^3). At every level he runs 2.5 times through the list. Once to get its length. 1/2 to split and once to join. -> O(2.5N) = O(N). And there are log(n) levels. -> O(logN*N)
His implementation might not be efficient, but it is still in O(logN*N).
Re:radix sort vs. comparison sort by adrianmonk · 2007-02-25 15:01 · Score: 4, Informative

He has a TRULY AWFUL implementation of merge sort. For example, every time he goes to split a sub-list in half, he does a linear search from the current node to the end the sub-list. Having determined this value he then does ANOTHER LINEAR SEARCH to find the half-way point.

So he has basically made an O(N^2) time complexity process just to divide the list for the merge sort. This is inside his n*log(n) merge sort. So he has an O(n^3) time complexity mergesort and trumps up how fast his modified radix sort is. Come on! Bubble sort would have beat his mergesort.

That is maybe a bad implementation, but it actually does not make it O(n^3). In fact, it doesn't make it as bad as even O(n^2). Here's why:

The first thing I want to establish is that linearly iterating halfway through the sublist a second time (to find the middle of it) has no effect on the asymptotic analysis. It just means going through the sublist 3/2 times, which is still a constant.

The second thing is this: as you go deeper and deeper into the recursion, the size of the sublist is repeatedly cut in half. Thus, at every level as you go deeper into the recursion you are doing twice as many loops as the previous level (one loop for the first level, when the sublist equals the original list; two loops for the second, when you are breaking halves into quarters; four loops in the third level of recursion; and so on). BUT, although you are doing more loops, the size of the sublist is always half what it was at the previous level of recursion. So have you twice as many loops but half as many loop iterations.

That means at every level of recursion (think of this size of the call stack if that helps in visualizing it), you are looping over the entire list once (well actually 3/2 times). And there are O(log n) levels of recursion, because every time you recurse, you are working on a list half as big as before.

The result is that the total extra added time of this unnecessary linear traversal is O(n * log n) for the entire algorithm. Since mergesort is already O(n * log n), the extra looping to find the middle and end of the list has no effect on the asymptotic behavior of the algorithm.
Re:radix sort vs. comparison sort by will.murnane · 2007-02-25 15:03 · Score: 1

Where do you see this O(n^2) stuff going on? It looks to me like O(n). In listops.h (trimmed down for readability):
Node* splitlist(Node* p) {
if (not p) return NULL;
do {
p3 = p2;
p2 = p2->nextNode; /* advance 1 */
p1 = p1->nextNode;
if (p1) p1 = p1->nextNode; /* advance 2 */
} while (p1);
p3->nextNode = NULL; /* terminate 1st half */
return p2; }
So he does O(n) work whenever he calls splitlist. It's lousy style, perhaps, but not O(n^2). Mergelists is O(n) anyways, so you end up with T(n) = 2T(n/2) + O(n) just like well-done quicksort. I'd write the whole thing with non-recursive quicksort, I think, but I haven't tried to implement sorting on linked lists so I could be unpleasantly surprised. Not as surprised as I was viewing this code, though - could he not find a book on this stuff?

By the way, C++'s "new" operator allocates storage on the heap, not the stack. If he said long Head[65536] instead of long* Head = new long[65536] (why isn't he using constants for that number!?) it'd be on the stack.

Or rather, [65535] instead of [65536] in the last paragraph. Oops on his end.
Re:radix sort vs. comparison sort by kigrwik · 2007-02-25 20:30 · Score: 2, Interesting

Disclaimer: I didn't read the code, and don't intend to.

I don't know if you're still interested, and I'm just hypothesizing here, but 0xFFFF is 65535 and some other posters noted that his array was of size 65535, which of course means that index 65535 is out of bounds by just one. This could cause your segfault.

HTH,
Kig.

--
-- don't discount flying pigs until you have good air defense
Re:radix sort vs. comparison sort by Wavicle · 2007-02-26 04:48 · Score: 2, Informative

That was indeed the problem. I did eventually get the thing running and, interestingly, despite using many published mergesort algorithms for sorting linked lists, the bitsort algorithm generally ran a constant 3-4x faster. So it appears that I was wrong about his mergesort having an asymptotic characteristic of O(n^3) and improving the efficiency of his mergesort gave only incremental performance improvement.

Unfortunately this followup will forever be buried below the mod threshhold but it appears that despite my best efforts, radix sort (bitsort in this case) is superior to mergesort for linked lists. I tested different list sizes, different random ranges and a few other things I could think of, and bitsort always ran faster.

The only weirdness I could not get rid of is that every time I compiled using gcc -O2 I got segfaults. I did not have time to track it down, I suspect stack smashing due to array usage irregularities.

--
Education is a better safeguard of liberty than a standing army.
Edward Everett (1794 - 1865)
Re:radix sort vs. comparison sort by Anonymous Coward · 2007-02-26 06:59 · Score: 0

This was marked informative??? Where the hell did you learn algorithmic analysis???
Re:radix sort vs. comparison sort by avanaardt · 2007-02-26 08:39 · Score: 1

"Bubble sort would have beat his mergesort" My view exactly. Maybe someone should try a shoot-out using the same dataset?

wtf? seriously. by tomstdenis · 2007-02-25 07:55 · Score: 4, Informative

First, it's just a radix sort [well that's my take]. Second there are too many google ads on that page. Third, merge sort is O(n log n) time, hard to beat for random data. Fourth, most people use variations of the QUICK SORT not merge sort.

Tom

--
Someday, I'll have a real sig.

Re:wtf? seriously. by Anonymous Coward · 2007-02-25 08:07 · Score: 0

The point is not that this is some revolutionary way to sort, but that it is a fast way of sorting LINKED LISTS. Since linked lists do not have random access, you cannot perform a quicksort on them.

dom
Re:wtf? seriously. by slamb · 2007-02-25 08:07 · Score: 4, Insightful

Fifth, most people don't sort link lists.
Sixth, the headline "10X faster" is incorrect, as they differ asymptotically, not by a constant factor. (Run different data sets...vary by size of a single element, and by number of elements in the list. The ratio will change.)
Re:wtf? seriously. by phazer · 2007-02-25 08:10 · Score: 5, Insightful

Fifth, the author tries to "GPL" the algorithm, which is utter nonsense. GPL deals with copyright, so the most he can do is GPL his implementation of his bucket-/radix sort. Anyone is free to re-implement the algorithm, GPL or not.

It would require a software patent to restrict the use of the algorithm to GPL programs.

(And sixth, a quick look in a text book would have clued the author in)
Re:wtf? seriously. by Alaria+Phrozen · 2007-02-25 08:14 · Score: 1

Get NoScript for Firefox. It's annoying the first few days as you build a white-list, but now I don't deal with any crap. Google-analytics is forbidden and that page is now ad-free.
Re:wtf? seriously. by Anonymous+Brave+Guy · 2007-02-25 08:19 · Score: 3, Informative

Third, merge sort is O(n log n) time, hard to beat for random data.

Provably impossible, in fact.

Fourth, most people use variations of the QUICK SORT not merge sort.

I'm not sure that's a fair generalisation. There are several good O(n log n) sorting algorithms, heap sort being another obvious one, and typically things like the data structures involved, need for stability, or ease of parallelisation are the deciding factors in practice. In fact some of the best performing algorithms in real terms are theoretically O(n^2) in the worst case, hence the creation of hybrids like introsort, a quicksort variant designed to avoid the worst case by switching to heap sort after a certain recursion depth is reached.

Anyway, I'm afraid this is another article by a 21-year-old student who is well-meaning, but not nearly experienced enough yet to appreciate that his suggestion is just rehashing (no pun intended) a lot of long-established ideas. I think the giveaway sign is that his CV on the site still lists just about every programming language and web development skill in existence. :-)

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Re:wtf? seriously. by vidarh · 2007-02-25 08:37 · Score: 2, Insightful

Actually sorting linked lists is useful in many cases, and can be done fairly efficiently with a well written quicksort (without the step of converting to array that this code does), since quicksort is conceptually just recursive partition around a pivot element - partitioning a linked list is cheap and can sometimes be faster than using an array if the objects being sorted are complex and stored by value (i.e. in any case where moving the linked list pointers around would be faster than copying the objects). A variety of other sorts are also easily adaptable to linked lists.
Re:wtf? seriously. by TheRaven64 · 2007-02-25 08:44 · Score: 2, Insightful

Third, merge sort is O(n log n) time, hard to beat for random data. Provably impossible, in fact. Not quite. O(n log(n)) is only the best case for comparison-based sorting. That is, if you have a less than (or greater than) predicate, you can build an algorithm that is O(n log(n)) using it. If your algorithm is not comparison-based, you can beat O(n log(n)). The most obvious example is bucket-sort, where you have k possible values. You pass over your list once, placing the value in the correct bucket, which gives you O(n) performance, at the cost of requiring k buckets worth of storage space (you can create the buckets on the fly, but then you add a O(log(n)) search into each step of the algorithm, dropping it back to O(n log(n))).

--
I am TheRaven on Soylent News
Re:wtf? seriously. by Anonymous Coward · 2007-02-25 08:53 · Score: 0

Seeing skills on his CV like "Visual Basic ***Excellent***" really engender my confidence in his abilities as a developer. You're just being unfair!
Re:wtf? seriously. by SnowZero · 2007-02-25 09:02 · Score: 1

Quick sort does not require random access in order to be E[O(n*log(n))] on a shuffled linked list. Here is a good example. Even if you want to use the middle element as a pivot (good for near-sorted data), you can scan the linked list linearly for that element, and it will amortize to the same complexity.

Now, there is the question of whether you want to do it, but you most certainly can. For my unsigned integer linked-list sorting needs, I've found radix sort with a radix of 6 to work pretty well in practice.
Re:wtf? seriously. by GileadGreene · 2007-02-25 09:08 · Score: 1

Actually, you can perform quicksort on linked lists. But without random access, many of the optimizations you can use to make quicksort really "quick" can't be done, so there isn't much reason to prefer quicksort over a good mergesort implementation for linked lists.
Re:wtf? seriously. by Brian_Ellenberger · 2007-02-25 09:21 · Score: 1

First, it's just a radix sort [well that's my take]. Second there are too many google ads on that page. Third, merge sort is O(n log n) time, hard to beat for random data. Fourth, most people use variations of the QUICK SORT not merge sort.

For comparison searches, O(n log n) is not just hard to beat it is IMPOSSIBLE to beat.
See http://www.ics.uci.edu/~eppstein/161/960116.html.

Of course, radix sort is not a comparison sort, but has its own limitations as others have noted.

And Quicksort technically has a O(n^2) worst case (reversed list) but is more often picked over Merge Sort because on avg it is nLogn and doesn't require the extra memory merge sort requires.
Re:wtf? seriously. by Ivan+the+Terrible · 2007-02-25 09:23 · Score: 2, Interesting

> the best performing algorithms in real terms are theoretically O(n^2)

Agreed. For example, in a microcode example at work, the cost of searching for a given string was totally dominated by memory access time, so it really didn't matter which algorithm we used. So we used brute force. Pretty easy to debug, as well.
Re:wtf? seriously. by Anonymous Coward · 2007-02-25 09:33 · Score: 0

I'll second that. I saw the NoScript icon at the head of the page where it blocked the banner but decided to look at it with IE (no funky protection) because of all the griping here. I want to gouge out my eyes with a fork now.
Re:wtf? seriously. by Anonymous Coward · 2007-02-25 09:52 · Score: 0

> Third, merge sort is O(n log n) time, hard to beat for random data.

> Provably impossible, in fact. [Yadda yadda, I know more than he does, etc.]

And you're provably f***ing stupid. Reread your CLR.
Re:wtf? seriously. by plover · 2007-02-25 09:58 · Score: 0, Offtopic

I have used NoScript (it's on right now, but set to "global allow"), but I prefer using AdBlock Plus to build up a blacklist of various scripts and script-hosting sites (such as google-analytics.) NoScript is kind of irritating as it doesn't support black-listing -- if you have a site with some scripts you want run, but others denied, it still pesters you. And for some reason that I haven't bothered to figure out yet, NoScript is in my processing chain before AdBlock, so blacklisting in AdBlock doesn't prevent NoScript from whining.
Speaking of whining, does anyone know how to specify the order of processing of Mozilla extensions? I'd love to leave NoScript on to catch the stuff that I haven't told AdBlock to stop yet.

--
John
Re:wtf? seriously. by kasperd · 2007-02-25 10:31 · Score: 1

And Quicksort technically has a O(n^2) worst case (reversed list)
The worst case depends on how you choose the pivot element. If you choose the first or last element as pivot, the worst case will be something sorted in one direction. But if you choose the midle element as pivot, a sorted list will actually be the best case. What the worst case is for that choice of pivot, is a little more complicated to compute, and not something I'd expect to see except for adversarialy constructed lists. It is even possible to compute the meridian in linear time. Using that you could do quicksort with a worst case of O(n log n). But the overhead for computing the meridian would be so big, that other sorting algorithms would be a better choice.

--

Do you care about the security of your wireless mouse?
Re:wtf? seriously. by kasperd · 2007-02-25 10:35 · Score: 2, Informative

Fourth, most people use variations of the QUICK SORT not merge sort.
Maybe for data that fits in memory. But if the data is on disk and you can't fit all of it in memory, nothing beats multiway mergesort.

--

Do you care about the security of your wireless mouse?
Re:wtf? seriously. by ShakaZ · 2007-02-25 10:40 · Score: 1

Why don't you disable notification and stop whining about Noscript... seriously if you choose to set it to global allow, then you might as well stop using it. I can't even imagine why one would want Noscript to support blacklisting as then it wouldn't prevent malicious scripts from running when you'd visit new or updated websites, so it'd be just as useless as allowing all scripts to run.
Re:wtf? seriously. by ShakaZ · 2007-02-25 10:47 · Score: 1

Is there an anti-NoScript mod alliance here at work or what? The only guys who isn't modded down happens to be the one delivering nonsense about that extension...
Re:wtf? seriously. by ShakaZ · 2007-02-25 10:54 · Score: 1

Very strange, that last reply was modded down as soon as i hit the Submit button... looks like there is some Slashdot mod robot at work or something.
Re:wtf? seriously. by Dwedit · 2007-02-25 11:05 · Score: 2, Informative

Merge sort only requires extra memory to store a list of indexes or pointers, not extra memory for the full data. If your data is integers, then it's the same size. If your data is strings, then the array of indexes is smaller.
As a plus, merge sort requires no recursion, which may be helpful if your stack is really tiny, like 200 bytes large.
Re:wtf? seriously. by Anonymous+Brave+Guy · 2007-02-25 11:46 · Score: 1

You're right, of course. I foolishly assumed that we were talking about a fair comparison, which since merge sort is the other algorithm under discussion, kinda implies comparison-based sorting. In that context, you simply can't beat O(n log n) without additional information. On further examination, the algorithm in question seems to be some variation of radix sort, meaning that it isn't really a 10x faster substitute for merge sort on linked lists at all, unless the data in your linked lists happens to be of a very specific nature.

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Re:wtf? seriously. by Paradise+Pete · 2007-02-25 11:54 · Score: 0, Troll

Sixth, the headline "10X faster" is incorrect, as they differ asymptotically,
And seventh, (and admittedly I'm really reaching here) the proper grammar for comparisons is "as fast," not "faster."
"Faster" is not logically correct, as can be seen more easily when the two approach equality. If, for instance, 10 seconds is ten times faster than 100, then to be consistent 100 must be one times faster than 100, which is just plain silly.
Re:wtf? seriously. by slamb · 2007-02-25 12:12 · Score: 1

Actually sorting linked lists is useful in many cases

Such as? I won't be so foolish as to claim it's never useful, but I will say there's a reason most of the focus is on sorting arrays.
partitioning a linked list is cheap

Partitioning an array is free. Comparisons in an array are O(1). Operations on linked lists mean traversing to the appropriate spot, an O(n) operation. The normal way people count sort operations is by the number of comparisons. If each comparison has a non-constant traversal cost, the cost of the whole is higher. I'm not sure off the top of my head exactly what it is. Tempting to say O(n^2 log n) instead of O(n log n), but since the size of the partitions (which bound the traversal distance) shrink through the sort, it might be O(n log^2 n) or something. I'm curious now; may have to work through the problem and find out.
and can sometimes be faster than using an array if the objects being sorted are complex and stored by value (i.e. in any case where moving the linked list pointers around would be faster than copying the objects). A variety of other sorts are also easily adaptable to linked lists.

Agreed, but your careful wording implies you've already thought of the other way - an array of references.
Re:wtf? seriously. by dkf · 2007-02-25 12:36 · Score: 2, Informative

The kings of the sorting hill are quicksort, heapsort, mergesort, and insertion sort. For real data, quicksort (especially with a good pivot selector sub-algorithm) and heapsort are fastest (heapsort is also very frugal with the stack). Mergesort is a bit slower (but still O(n log n) with reasonable factors) and has the major bonus of being stable, so that elements that are equal to each other are not reordered (and that's very useful indeed with real data). Insertion sort (also stable) is the only O(n2) algorithm worth considering, and then only in a special case: when you've got some sorted data and want to add an item to it, this being a case that comes up surprisingly often. Insertion sort is also really easy to implement correctly.

Radix sorts are reasonable, but only if the input keys are fairly short. I've never had data like that myself; for 10 character strings, a good quicksort is theoretically better for lists of a million keys or so, which is pretty good. ;-) Also, C programmers should definitely prefer to use quicksort, of course, because they've got an implementation of it in libc...

--
"Little does he know, but there is no 'I' in 'Idiot'!"
Re:wtf? seriously. by linvir · 2007-02-25 13:05 · Score: 1

You replied to yourself. Seems like a likely kind of post to automatically mod-down.
Re:wtf? seriously. by slamb · 2007-02-25 15:00 · Score: 1

I said before:
Partitioning an array is free. Comparisons in an array are O(1). Operations on linked lists mean traversing to the appropriate spot, an O(n) operation. The normal way people count sort operations is by the number of comparisons. If each comparison has a non-constant traversal cost, the cost of the whole is higher. I'm not sure off the top of my head exactly what it is. Tempting to say O(n^2 log n) instead of O(n log n), but since the size of the partitions (which bound the traversal distance) shrink through the sort, it might be O(n log^2 n) or something. I'm curious now; may have to work through the problem and find out.

Hmm. Actually, I guess it is still O(n log n) [*]. Maybe this is what you (vidarh) were talking about: any time you have to traverse to the partition point, you'll also be looking at each element on both sides of it shortly afterward. So the traverse basically just increases your reads by about 50%, which is not an asymptotic difference. It might have a rather unfortunate interaction with the cache, however.
[*] - in the "average case", that is. Quicksort's worst case is O(n^2).
Re:wtf? seriously. by plover · 2007-02-25 17:29 · Score: 1

Because I approach it differently. I want both blacklisting as well as whitelisting. There are sites that I never want to allow scripts from, there are sites I always want to allow scripts, and there are sites with scripts that I don't know about yet. Those are the only ones where I want notification. Once I figure them out I'll throw them in the appropriate "good" or "bad" bucket; after that I'd like NoScript to shut up about them.
Sure, it'd be possible for a formerly benign site to create a "locally evil" script bypassing NoScript, but it's far more likely they'll farm something out to a third-party tracking firm like IMRworldwide.

--
John
Re:wtf? seriously. by Gotta+ask+yourself.. · 2007-02-25 19:57 · Score: 1

First off, this was all just a scam, now the page links to the wikipedia entry about Radix Sort, whilst it still contains a lot of Google ads. We're making the guy rich. :-)
Then, you can both apply quick and merge sort to a linked list, no need to have random access. Yes, it'll be slower than if you had random access to the lists item, but the complexity would be just the same on both cases.
If your list is doubly linked, however, you can apply merge sort to ot without any drawbacks and without requiring random access: you basically first split the list of n items in n circular lists, then you go power-of-two lists at time merging them together, till you only are left with one list, which will be the sorted one.
You use the "next" pointer of the last item in the list in order to keep a reference to the next list, whilst you use the "prev" pointer of the first item in the list to point at the first item of the list.
Here is an implementation of both quick and merge sort on doubly linked lists.
Re:wtf? seriously. by Anonymous Coward · 2007-02-25 20:52 · Score: 1, Interesting

"In fact some of the best performing algorithms in real terms are theoretically O(n^2) in the worst case, hence the creation of hybrids like introsort, a quicksort variant designed to avoid the worst case by switching to heap sort after a certain recursion depth is reached."

Introsort is silly, and STL is dumb to use it. I am a PhD student in CS theory, and I can safely say this is unjustified use of too much theory in practice. Using introsort is no more reasonable than STL's decision to heavily favor tree-based sets/map over the much faster hash-based sets/maps.

Quicksort with random pivots is faster than heapsort. It runs in O(nlog n) time on any input with extremely high probability unless the data generator somehow outguessed your random number generator (not bloody likely in any real situation). Furthermore, even if random quicksort has performed badly so far, it IS STILL FASTER to continue with quicksort than it is to switch to heapsort because past performance is no predictor of future performance with random pivots.

So what does STL do? They use a worse version of quicksort because they are scared of the wildly crazy theoretical implications of randomness, and then have to implement yet another sort on top of it because their original algorithm sucks. Nice.
Re:wtf? seriously. by ShakaZ · 2007-02-25 20:57 · Score: 1

Still there's no need for a blacklist for you to use NoScript the way you want. Noticed the little Noscript icon down there in the middle of the status bar? It changes depending if all scripts are allowed/ not allowed or if only some scripts are allowed... Notification is only useful to someone who has never used Noscript and doesn't know about that icon & the functionality it provides. I can't get easier than this : Bad site --> do nothing... Good site --> allow the needed scripts to run (2 clicks per script...wow) Testing site --> temporarily allow the scripts you think are useful
Re:wtf? seriously. by tcdk · 2007-02-25 21:57 · Score: 1

Anyway, I'm afraid this is another article by a 21-year-old student who is well-meaning, but not nearly experienced enough yet to appreciate that his suggestion is just rehashing (no pun intended) a lot of long-established ideas. I did this once in "advances data structures" class. The teacher was explaining about linked arrays, and I managed, all by my self, to invent the loosely linked multi dimensional array (or some such, it's been a bit), which I promptly shared with the class, thus ensuring that this monumental discovery could wouldn't go unused. The teachers response was along the lines of "yes, that would be Chapter 8 in your text book and we will working on that next week".

--
TC - My Photos..
Re:wtf? seriously. by ioshhdflwuegfh · 2007-02-25 23:38 · Score: 1

Third, merge sort is O(n log n) time, hard to beat for random data.
Provably impossible, in fact.
So what is the proof of impossibility, in fact?
Re:wtf? seriously. by Anonymous+Brave+Guy · 2007-02-26 02:00 · Score: 1

Please make at least a vague attempt to research before you post. Googling for "proof sort nlogn" turns up several papers giving the standard arguments for the lower bound.

(And yes, I'm only talking about comparison sorting here. Obviously you can use radix/bucket/whatever for specialist applications and these are lower than the bound, but they're not exactly equivalent algorithms to the merge sort referenced in the headline, are they?)

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Re:wtf? seriously. by Anonymous+Brave+Guy · 2007-02-26 02:03 · Score: 1

Hey, I didn't say the hybrids were an improvement! :-)

My point was that an algorithm that is fast in real terms -- such as quicksort -- may be theoretically less efficient in the worst case than O(n log n). That says more about the practical relevant of measuring complexity using big-O notation than anything else.

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Re:wtf? seriously. by ioshhdflwuegfh · 2007-02-26 03:45 · Score: 1

Let's try one more time:

Third, merge sort is O(n log n) time, hard to beat for random data.
Provably impossible, in fact.
So what is the proof of impossibility, in fact?
to which you answer:
Please make at least a vague attempt to research before you post. Googling for "proof sort nlogn" turns up several papers giving the standard arguments for the lower bound.
which does not answer the question, does it?
(if the lower bound for merge-sort is O(N log N), then what is "the proof of impossibility, in fact", in fact?)
(And yes, I'm only talking about comparison sorting here. Obviously you can use radix/bucket/whatever for specialist applications and these are lower than the bound, but they're not exactly equivalent algorithms to the merge sort referenced in the headline, are they?)
(every sorting algorithm is comparison based, by definition. Then "specialist application" means that there is some additional knowledge about objects that simplifies sorting, like properties of associated keys or, for a (non)trivial example, when one wants to sort an array that is known to be in descending order into ascending order, etc. Once one learns something about the dataset one is sorting, it might be preferable to get an efficient sorting algorithm from another standard sorting algorithm even if they are "equivalent".)
Re:wtf? seriously. by plover · 2007-02-26 04:58 · Score: 1

I'm sorry you don't understand. It does not work the way *I* want. I like the notifications -- and as long as the notifications are limited to "once per site" I can deal with them as they arise. I'll blacklist or whitelist them at that time. But just like the notifications, the icon can't tell me the difference between a site blocking a script that I haven't looked at and blocking a script that I've already evaluated as bad.
Believe me, I've tried to get NoScript to do what I want. You're right -- as it's configured now, it's useless on my box. The only reason I still have it installed is to keep up-to-date on the release notes to see if he's added blacklisting functionality. And I see in the forum that other people have requested blacklisting, because apparently I'm not the only person who works this way. I know it's not an easy thing to add. But until it's there, it doesn't do what I want; it doesn't work the way I want it to work.

--
John

10X speedup? Compared to what? by DaleGlass · 2007-02-25 07:57 · Score: 2, Insightful

It's useless to talk about the speed of a sort algorithm without discussing its complexity.

Linear improvements are completely uninteresting. Take algorithm X in Perl, then a version in optimized assembly, and there you go, a 10X improvement without having done anything groundbreaking.

A 10X faster bubble sort is just as useless for large enough values of n, and a 10x slower merge sort would still have a very good performance in comparison.

Re:10X speedup? Compared to what? by kasperd · 2007-02-25 10:14 · Score: 1

True, having the right complexity class is more important than a constant factor of 10. But if you have two algorithms with the same complexity class, there can be a practical difference. And there are also some theoretical measures that would make sense such as the number of comparisons. However if you take that measure, I believe mergesort has been proven to be less than a factor of 10 from optimal. So the only way to beat mergesort by a factor of 10 would be something which is not a comparison sorting algorithm, and thus something that only works in cases where the keys have a special structure. Beating mergesort by a factor of 10 in the general case is just not possible. BTW when doing a mergesort of a linked lists, you can do some neat optimizations that improve the best case complexity in cases where large parts of the input are already sorted - of course this have a small additional cost in the average case.

--

Do you care about the security of your wireless mouse?
Re:10X speedup? Compared to what? by rbarreira · 2007-02-25 21:31 · Score: 1

Linear improvements are completely uninteresting. Take algorithm X in Perl, then a version in optimized assembly, and there you go, a 10X improvement without having done anything groundbreaking.
That's the kind of thinking which leads to the current kind of bloated and slow programs. Of course a better algorithm is always the first step, but programmers should always try to optimize their bottleneck algorithms. Would you like it if your software was 10 times slower? What about 100 times slower?

In my opinion, it is ridiculous to say that linear improvements are irrelevant. If we believe the current predictions about CPUs (number of cores increasing as opposed to GHz), this is even more true, since the old excuse of "the hardware will get better in 6 months" will stop being true.

--

The AACS key is NOT 0xF606EEFD628B1CA427BEA93A9CA9773F

Nice try, b ut... by Ancient_Hacker · 2007-02-25 07:57 · Score: 4, Informative

I think you may have re-invented the "radix sort".

It was used by 80-col card sorters, since circa 1925.

See Knuth, Volume 3

Re:Nice try, b ut... by RevMike · 2007-02-25 08:22 · Score: 1

I think you may have re-invented the "radix sort".
It was used by 80-col card sorters, since circa 1925.
I was thinking the same thing. One of my coworkers told me about using those beasts in the 1950s while serving in the Army.
Re:Nice try, b ut... by DrPepper · 2007-02-25 09:39 · Score: 1

Hi,
Java Blog for Experienced Programmers - http://www.0xcafebabe.com/ [0xcafebabe.com]

You might like to know that the above domain (which I assume is yours from the DNS registration) seems to have been parked.

multithreaded merge by Zork+the+Almighty · 2007-02-25 07:59 · Score: 1

Since the article has nothing, I wanted to ask: has anyone implemented merge sort where you merge from both the front and back using two simultaneous threads ? I haven't seen this anywhere, but it's an obvious improvement for multicore machines. Does anyone know how to further parallelize merging (or for that matter, heaps) ?

--

In Soviet America the banks rob you!

Re:multithreaded merge by Zork+the+Almighty · 2007-02-25 08:07 · Score: 1

Nevermind, I figured it out. You need to estimate the median (like quicksort) to parallelize merging to four cores. If anyone still has suggestions for heaps, I'd like to hear it though.

--

In Soviet America the banks rob you!
Re:multithreaded merge by Anonymous Coward · 2007-02-25 08:11 · Score: 0

The parallelization of merge sort is nothing new, and it is rather simple to do because of the recursive nature of algorithm. I've seen parallel versions of merge sort with load balancing to sort real large data quantities.
Re:multithreaded merge by Anonymous Coward · 2007-02-25 08:12 · Score: 0

Check this out.. The generalization is here..

http://www-unix.mcs.anl.gov/dbpp/text/node123.html

http://www-unix.mcs.anl.gov/dbpp/text/node127.html
Re:multithreaded merge by SuperMog2002 · 2007-02-25 08:16 · Score: 2, Interesting

I've never seen it done before, but I don't imagine it would be particularly diffucult to pull off. I'd say just add another parameter that tells the function how deep in the call stack it is. If it's less than x layers deep, fork with the child process going on to the other processor. The parent process sorts the first half of the list, and the child process sorts the second half. When the parent process finishes, it waits for the child process to finish (if it hasn't already), then collects the child's sorted list and merges as usual. The reason you'd want to track how deep you are is so that you don't wind up forking when there aren't any more processors available. For instance, if you were running on a two processor machine, you'd only want to fork once, since forking again would only add overhead without any gain. You could probably generate x on the fly based on the number of processors in the user's computer, too.

The only problem I forsee is the case where the added overhead exceeds the gain. However, I imagine this would only be the case when you are sorting very small lists. Perhaps one could check the size of the list first and then decide whether or not forking is worth it.

--
Sunwalker Dezco for Warchief in 2016
Re:multithreaded merge by TERdON · 2007-02-25 08:26 · Score: 4, Informative

I have studied a course in parallel algorithms, including localized parallel merge sort (the algorithm you requested). It can be used to subdivide parallelized sorting theoretically unlimitedly. Links: Course homepage, The relevant chapter (PDF) of the course slides, with nice pictures and everything.

--
I have a really elegant proof for Fermat's last theorem. If this sig was only a bit longer...
Re:multithreaded merge by DamnStupidElf · 2007-02-25 08:44 · Score: 1

Since the article has nothing, I wanted to ask: has anyone implemented merge sort where you merge from both the front and back using two simultaneous threads ? I haven't seen this anywhere, but it's an obvious improvement for multicore machines. Does anyone know how to further parallelize merging (or for that matter, heaps) ?

There are entire classes of sorting algorithms for multiprocessor machines, including algorithms optimized for shared versus non-uniform memory architectures, message passing, sorting networks, and much much more. In general for m processors, sorting becomes an O(n log n / m) problem just as you'd expect (or hope).
Re:multithreaded merge by Helios1182 · 2007-02-25 09:05 · Score: 1

To go even one step better you can use a sorting networks and get O(lg n) complexity. Of course you need a network of size O(n lg n). Given a mesh-based computer you can sort N^2 items in O(n) time. But for now most of us are stuck with serial computers with small attempts at parallelism.
Re:multithreaded merge by CoughDropAddict · 2007-02-25 13:24 · Score: 2, Informative

I notice that the course homepage talks about designing for an "idealised parallel computer" -- did it take into account the huge speed hit that comes from false sharing on real processors? That should be a major consideration if you are designing shared-memory parallel algorithms, since they can degrade performance by 1700% on modern processors. If you're not careful, your parallel algorithms could end up being much slower than single-threaded ones simply because they force so many cache bounces.
Re:multithreaded merge by TERdON · 2007-02-25 13:32 · Score: 1

No, the course didn't cover this (at least not as a main topic - nonlinear speedups were mentioned, but not much more). It was an academic, theoretical course on algorithms, so practice was kind of left out... :)

Also, the course really wasn't limited to shared memory architectures, the algorithms could just as well have been used in message-based architectures, or clustered workstations, or a data-flow-oriented architecture, or...

Also, the problem of false sharing can be handled if properly avoided at design time (as a simple google search shows).

--
I have a really elegant proof for Fermat's last theorem. If this sig was only a bit longer...

animation by hansamurai · 2007-02-25 07:59 · Score: 2, Funny

Let me know when there's a video on Youtube of the sort doing it's thing with colored bits and animation.

--
Reviewing just the first hour of video games.

Re:animation by Ossifer · 2007-02-25 08:26 · Score: 1

You can be sure such a video would be just as filled with ads, product placements, etc.

And just as unscientific...
Re:animation by Anonymous Coward · 2007-02-25 09:46 · Score: 0

http://www.youtube.com/results?search_query=sort+a lgorithm&search=Search

knock yourself out.
Re:animation by fbjon · 2007-02-25 11:40 · Score: 1

Animated demos for 15 different sorts, including radix sort, plus one broken sort: Sorting Algorithms Demo

--
True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.

Re:FROTHY PISS by Anonymous Coward · 2007-02-25 08:03 · Score: 5, Funny

No.
You are an "HTML programmer".
An oxymoron and a moron, simultaneously.
Defenestrate yourself.

Apropos by heli_flyer · 2007-02-25 08:05 · Score: 1

"Those who cannot remember the past are doomed to repeat it." - George Santayana

Re:Apropos by T3hFish · 2007-02-25 09:37 · Score: 1

ARG! That is one of the most often misquoted phrases...
The correct quote is: "Those who cannot remember the past are condemned to repeat it." -- George Santayana
I guess it's a minor detail, but still...

--
"A witty saying proves nothing." -- Voltaire
Re:Apropos by UserGoogol · 2007-02-25 10:49 · Score: 1

Doomed sounds cooler, though. We quote things like that not because we really care what George Santayana thought on the issue, (George Santayana is a somewhat obscure figure outside of his tendency to pump out pithy sayings) but because they sound cool and express sentiments which are generally accepted without much support beyond their assertion.

Although of course I'll still be a pedant when I find a misquote that rubs me the wrong way.

--
"Never attribute to malice that which can be adequately explained by stupidity." -- Hanlon's Razor
Re:Apropos by asylumx · 2007-02-25 13:09 · Score: 2, Funny

"Those who cannot remember the past are doomed to repeat it." - George Santayana
Re:Apropos by Anonymous Coward · 2007-02-25 13:27 · Score: 0

"Those who cannot remember the past are doomed to repeat it." George Santayana

That doesn't matter! by Anonymous Coward · 2007-02-25 08:06 · Score: 5, Funny

This was written by VirusFree! Who wouldn't want to use something by a guy named VirusFree? It's a million times better than something by FreeVirus.

Re:That doesn't matter! by ScrewMaster · 2007-02-25 10:06 · Score: 4, Funny

No difference ... he's just saying that you can have his Virus ... Free.

--
The higher the technology, the sharper that two-edged sword.

oh really? by atamyrat · 2007-02-25 08:10 · Score: 1, Insightful

First of all, algorithm running times and complexities are not compared as 3x or 5x faster. Please give me running time in Big O notation. Saying 10 times faster than merge sort makes no sense.

MergeSort was the state of the art How about QuickSort?
AFAIK, quicksort (as well as mergesort) is optimal algorithm for comparison based sort, which means you can not find anything asymptotically better than that.

And how about submitting it to academic journals if you come up with something better?

Re:oh really? by Ossifer · 2007-02-25 08:32 · Score: 2, Funny

> First of all, algorithm running times and complexities are not compared as 3x or 5x faster. Please give me running time in Big O notation. Saying 10 times faster than merge sort makes no sense.

But I came up with a sorting algorithm yesterday that's 12 times faster than bubblesort! Click here to see my advertisements....

Seriously, mergesort (but NOT quicksort) is optimal, O(n log n) in one model. But there are sub-n log n sorting algorithms in other models. Some of them are based upon radix sort, surprise, surprise....
Re:oh really? by Anonymous Coward · 2007-02-25 08:40 · Score: 0

Dude, O(N log N) is not the best case for sorting. There are known algorithms for doing a little better than that. But not a lot better so using quicksort still makes sense.
Re:oh really? by Stalus · 2007-02-25 08:53 · Score: 2, Insightful

Well, consider the source. It was written by a 21 y/o student who has listed his primary language as Visual Basic. Obviously he isn't trained in theory. That said, a nice discovery for him, but not Slashdot news, and definitely not academic journal material.

Quicksort may have a best expected sorting time, but still has an O(n^2), since O is worst case. I think the STL takes a pass at the list to determine how sorted the list appears to be and then picks a sorting algorithm based on that... but it's been awhile since I looked at it.
Re:oh really? by Mr.+Underbridge · 2007-02-25 09:11 · Score: 1

Please give me running time in Big O notation [wikipedia.org].
Oh, but he says he didn't want to go into a bunch of theory. In other words, he doesn't understand the theory. In addition to theory, add copyrights and patents to the list of things he doesn't understand:
The algorithm is being released under the GPL ( General Public License ). The algorithm belongs to PhoenixBit and VirusFree
but you may use/modify it freely.

1. You release the code under the GPL, not the algorithm. 2. You don't 'own' the algorithm. You might patent it, but you haven't (and obviously won't since it appears to be radix). Additionally, to GPL the code, you'd have to provide an unlimited redistribution license to the patent if it existed.
Re:oh really? by Purple+Screws · 2007-02-25 09:21 · Score: 1

It all depends on what you mean by "optimal."

Merge sort and heap sort are both O(n log n) for arbitrary input -- they'll always take about the same amount of time no matter what the input data looks like. And O(n log n) is the best you can possibly do on average.

Quicksort, in contrast, is highly suboptimal -- O(n^2) -- in the case where the data are already in order. It only approaches O(n log n) behavior when the data are completely randomized. In general, of course, it depends on what data you're looking at, but quicksort performance will be somewhere in between these two cases.

People USE quicksort because it's optimal for many situations that people typically work with. That's not because of its asymptotic behavior, but because of its small constant factor.

Remember, our computers are not Turing machines. We never actually work with the average case. And n is a lot smaller than infinity.
Re:oh really? by DeKO · 2007-02-25 09:55 · Score: 2, Insightful

It only approaches O(n log n) behavior when the data are completely randomized.

No. Quicksort is O(n log n) on average. There are pathological scenarios, but all other instances absorb the O(n^2) worst case. Sedgewick's PhD (under Knuth's) was about Quicksort, try reading it if you are nuts enough. Or even read Knuth's volume on sorting, to see hem distilating and comparing even the constants from the algorithms.
Re:oh really? by Anonymous Coward · 2007-02-25 10:19 · Score: 0

He said *comparison-based* sorting for which you *can't* do better than O(N log N). This is proved in any entry CS undergrad class.
Re:oh really? by Anonymous Coward · 2007-02-25 11:16 · Score: 0

Algorithmic complexity is usually given in big-oh notation, but that doesn't mean it's not reasonable or valuable to compare the constant factors in algorithms. You can ignore implementation-related overhead but still consider, for example, total number of comparisons made between two elements in the list. One algorithm might be make no more than (n lg n) in the worst case while another might take as many as (4 n lg n). If the expected complexity and variance are the same for both algorithms and they have similar implementation overhead, the first algorithm would be preferable.

Incidentally non-randomized quicksort has O(n^2) complexity, but O(n lg n) expected complexity on a randomly ordered list. As convenient as it is to reduce "performance" to a single expression, things like constant factors and expected complexity variance actually matter.
Re:oh really? by j-pimp · 2007-02-25 13:22 · Score: 1

The algorithm is being released under the GPL ( General Public License ). The algorithm belongs to PhoenixBit and VirusFree
but you may use/modify it freely.

1. You release the code under the GPL, not the algorithm. 2. You don't 'own' the algorithm. You might patent it, but you haven't (and obviously won't since it appears to be radix). Additionally, to GPL the code, you'd have to provide an unlimited redistribution license to the patent if it existed.

Additionally, he should place this sort under the BSD license. At least I would. Something basic like this should be available for closed and open source applications.

--
--- Justin Dearing http://www.justaprogrammer.net/ We're just programmers.
Re:oh really? by Anonymous Coward · 2007-02-25 13:44 · Score: 0

Quicksort usually has a worst-case O(n^2) runtime, but it doesn't have to. It is possible to implement a guaranteed Theta(n log n) Quicksort. Excerpt From "Introduction to Algorithms", 2nd ed., by T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, p. 149 (section 7.2, "Performance of quicksort"):

"The running time of quicksort depends on whether the partitioning is balanced or unbalanced and this in turn depends on which elements are used for partitioning. If the partitioning is balanced, the algorithm runs asymptotically as fast as merge sort..."

A little further on, the section "Balanced partitioning" shows how this can be achieved and gives a formal proof of its Theta() bound. However, given that the expected worst-case runtime of randomized quicksort is O(n log n), the true Theta(n log n) quicksort isn't usually implemented. It's guaranteed worst-case performance is asymptotically better, but its average-case performance is slightly worse than randomized quicksort (by a constant factor only), and the "random" part in randomized quicksort doesn't depend on your data. (I.e., randomized quicksort won't perform poorly on bad input data. It will only perform badly if your input data is bad AND you get a really unlucky string of numbers from your rand() function.)
Re:oh really? by Anonymous Coward · 2007-02-25 16:31 · Score: 0

As a note, Quicksort is NOT optimal - its worst-case performance is O(n^2) rather than O(nlgn). What you are looking for is Introsort, a variant of Quicksort that switches to a worst-case O(nlgn) algorithm (usually Heapsort) after a constant recursion depth is reached. Introsort is worst-case O(nlgn) and still significantly (though not asymptotically) faster than Mergesort in the average-case.
Re:oh really? by mgiuca · 2007-02-25 17:58 · Score: 1

I haven't seen the article - it seems to be vanished or something. Anyway since everyone's saying it's just RadSort I'll go on that.

The difference between Radix Sort and one of those other sorts is in the complexity. Most sort algorithm's complexity is based on the number of elements solely. That is, Qsort and MergeSort are O(nlogn), Selection sort is O(n^2), etc. (Forgive me if I got any of these wrong, it's been awhile).

The point is that all of these sort algorithms have a maximum complexity of nlogn. Nobody has ever developed (nor is it thought possible) an algorithm which sorts faster than nlogn.

Enter Radix Sort: Which sorts with complexity O(n*m). Where m is the number of "bits" or the number of "radixes" you're going to select. So in general, this is nowhere near as good as Qsort or Mergesort, since m could be anything. A good example is for sorting strings - you can use radix sort (or I believe it's called "bucket sort" when you're not using bits), but the complexity will be the number of strings times the number of characters.

However, Radix sort shines when there is a fixed number of bits. Obviously that's true of integers. If you're sorting 32-bit integers, your complexity is suddenly O(n*32), which of course reduces to O(n). So, wow, we have a linear-time sorting algorithm! This is a good thing, but with three important caveats:

1. The constant multiplier factor is much larger than for most algorithms (ie. it's actually going to take 32 times longer than you might think just looking at O(n)). However, the reason why we eliminate constant multipliers from O notation is that they're meaningless with large data sets. So if you're sorting 10000 integers, then n*32 is far better than nlogn. In other words radix sort keeps getting better as the data size increases, but the bit size remains the same.
2. This doesn't work, or doesn't work well, in general. It only works on specific data types, such as integers. Other data types, such as strings, will perform poorly. Furthermore, the entire implementation is data-type dependent (that is, it has to be written totally differently for sorting ints, floats, strings, etc). This is as opposed to almost any other sorting algorithm where you just provide a comparison function for two elements of a given type, and it works.
3. I just thought of this but it's a biggie: This whole algorithm is very low-level. That is, it's language-depentend, platform-dependent and compiler-dependent as to how things are stored at the bit level. You need to be aware of if numbers are in two's complement. You need to know the exact floating-point representation if you're sorting FP numbers (and not all FP representations can even be bit sorted). And your language needs to support bit accesses - this is not possible in high level languages like Python or Haskell or maybe even Java. So if you write radix sort, even in C, it won't be properly portable code.
Re:oh really? by drxenos · 2007-02-26 07:08 · Score: 1

O is not worse case. O is an upper bound. You can talk about best and average Big-O (and we usually talk average), not just worse case. Quick sort is usually listed as being bounded in time with its average case of O(n*log n), not its worse case of O(n^2). Also, remember that Big-O (and other type of bounds) can be used to bound other resources, like memory, not just time.

--

Anonymous Cowards suck.
Re:oh really? by drxenos · 2007-02-26 07:12 · Score: 1

The C++ Standard does not dictate what algorithm the STL is required to use (though is does define bounds). Also, it defines more that one sorting function (i.e., stable and non-stable). Most professional libraries that do not require a stable sorting algorithm usually utilize the introspection sort.

--

Anonymous Cowards suck.
Re:oh really? by Stalus · 2007-02-26 08:10 · Score: 1

Yes, O is an upper-bound, so stating that quicksort has an O(n^3) is technically correct. However, the least-upper bound is most informative, and this is in fact the worst case scenario, O(n^2). Saying O(n log(n)) is incorrect. Saying it has an O(n log(n)) expected time isn't much better, but in some situations, it's easier to think that way.

Honestly, I think the reason it's commonly used that way is because it's introduced in intro data structures courses, where most algorithms have a similar Big-O and expected running time. Quicksort tends to be one of few interesting exceptions, and rather than spend the time explaining the distinction between expected time and Big-O, teachers invent "average Big-O".

Even so, both pieces of information are needed to be useful. If you give me a quicksort and you tell me that it has O(n log(n)), and I find that I'm running into cases where it's running at n^2, it could really be problematic if I was depending on n log(n) running time. However, if you just say it's O(n^2), I might consider it slower than it really is. So, it's important to be clear that it's O(n^2) with an expected polynomial time of n log(n).
Re:oh really? by jsonn · 2007-02-26 08:54 · Score: 1

The irony of this comment is that the binary logarithm of 10k is still much smaller than 32. Radix sort is efficient for other reasons.

First it doesn't require the whole dataset to exist at a time. This is nice for things like bzip2 where the data itself is partly generated.

Second it has predictable memory access patterns. This makes it nice for any modern CPU and is the reason why for many applications merge sort is already faster than quick sort.
Re:oh really? by drxenos · 2007-02-26 09:10 · Score: 1

If you are always dealing with purely random data, then true: you must bound a quick sort by O(n^2). Which is why "generic" sort functions in C, et. al., typically utilize the introspection sort. But how often is your data truly random? Knowing the best and average case of an algorithm is very useful if you know your data will not cause the worse case (or cause it for some acceptable limit of the data).

--

Anonymous Cowards suck.
Re:oh really? by Stalus · 2007-02-26 10:01 · Score: 1

Right, well, quick-sort hits the worst case when your data is sorted/almost sorted, and is only n log(n) for random data. So, if you're assuming that your data isn't random, then it's even more important to not advertise it as n log(n).
Re:oh really? by mgiuca · 2007-02-26 12:51 · Score: 1

The irony of this comment is that the binary logarithm of 10k is still much smaller than 32. Radix sort is efficient for other reasons.
Oh of course! You have to be larger than 2^32 ~~ 4,000,000,000! (Duh). So unless you're sorting google's database of web pages, it's not going to give better performance than nlogn.

(Wow, now that you mention it, nlogn is pretty close to n really, isn't it).
First it doesn't require the whole dataset to exist at a time. This is nice for things like bzip2 where the data itself is partly generated.
That doesn't seem right to me (since you need to do k iterations over the whole dataset, you must require it to exist in memory concurrently). But I'll have to think about it.

Can't believe I wasted my time by defile · 2007-02-25 08:12 · Score: 1

Against my better judgment, I skimmed the article, code.

Two observations:

Why no analysis of impact on memory usage? 10x speed-up, but at what cost? Omitting this detail makes it difficult to weigh the utility of the algorithm.
Not an Apples to Apples comparison. In the source, the "BitFast" sort converts the entire link-list into an array before performing the sort. Without even considering the merits of the actual algorithms, the Mergesort has no chance of being faster given the more complicated data structure used.

As others pointed out, this sounds like the Radix sort, but I was too disgusted to step through the source in detail and even verify this. Maybe someone else will have a stronger constitution.

Here is a good Mergesort implementation by Anonymous Coward · 2007-02-25 08:14 · Score: 1, Interesting

The poster has just implemented a radix sort. One can do them quite faster, but of course, they are basically restricted to sorting numbers, or at least bit strings with a reasonable diversity in the distribution of prefixes.

For a pretty efficient mergesort implementation in C/C++, try the following (has at one time been offered to libg++, but not taken up). It sorts, as an example, stdin line by line to stdout. The whole recursion stuff has been flattened into loops and binary arithmetic.
<blockquote> /* Copyright (c) 2007 David Kastrup dak@gnu.org */
#include <cstddef>
#include <iostream>
#include <string>
using namespace std;

template <class T,class V> class listsort : V {
public:
listsort(const V &arg) : V(arg) { }
T * &sort(T* &headin, size_t n) const
{
size_t underpow,n1,n2,bitrev;
int lev, kmax;
T **head = &headin;
T **headstack[CHAR_BIT*sizeof(size_t)];
T **tail;
T **tail1;
T *p1;
T *p2;
int sp = 0;
switch (n)
{
case 0:
return *head;
case 1:
return getlink(*head);
}
for (underpow = 1, kmax=0; underpow <= (n-1)/4;) {
underpow *= 2;
++kmax;
}
for (lev = kmax, bitrev=underpow, n -= 2*underpow;;) {
p1 = *head;
tail = &getlink(p1);
p2 = *tail;
if (listleq(p1,p2)) {
tail = &getlink(p2);
} else {
*tail = getlink(p2);
getlink(p2) = p1;
*head = p1 = p2;
}
switch ((bitrev+n)>>lev) {
case 2:
p2 = *tail;
if (listleq(p1,p2)) {
tail1 = &getlink(p1);
p1 = *tail1;
if (listleq(p1,p2)) {
tail = &getlink(p2);
break;
}
*tail1 = p2;
} else
*head = p2;
*tail = getlink(p2);
getlink(p2) = p1;
break;
case 3:
headstack[sp++] = head;
head = tail;
++lev;
continue;
}
for (;;) {
if (lev == 0)
return *tail;
if (!((bitrev>>--lev)&1))

Re:Here is a good Mergesort implementation by Shivetya · 2007-02-25 08:29 · Score: 1

just tagging a message... no comment otherwise

--
* Winners compare their achievements to their goals, losers compare theirs to that of others.

Re:first, finally by grimJester · 2007-02-25 08:17 · Score: 0, Offtopic

:-)
hello world

It may disappoint you, but you lost to a post titled "Frothy piss". If getting a "first post" is really that important to you, I suggest you take this as a lesson and re-evaluate your priorities in life. Being nearly as good as "Frothy piss" is a sad, sad life achievement.

You're making the baby Knuth cry by aprosumer.slashdot · 2007-02-25 08:18 · Score: 5, Funny

You're making the baby Knuth cry. Please read "The Art of Computer Programming" before proceeding any further.

Re:You're making the baby Knuth cry by vadim_t · 2007-02-25 08:28 · Score: 1

If Knuth looks like a baby to you, then you've seen some really scary ones. My condolences.
Re:You're making the baby Knuth cry by atamyrat · 2007-02-25 09:24 · Score: 1

No he won't cry! I think he'll be proud of his inspiring new book. From FTA
1. Programming Skills : Visual Basic ***Excellent***
2. I will not go into much theory analysis of the algorithm
Re:You're making the baby Knuth cry by maubp · 2007-02-25 09:56 · Score: 1

If Knuth looks like a baby to you, then you've seen some really scary ones. My condolences.

You seem to be aware of Knuth as a Computer Science God, but I guess you didn't get the "Baby Jesus" analogy?
Re:You're making the baby Knuth cry by Anonymous Coward · 2007-02-25 11:02 · Score: 0

Someone didn't get the "face a mother could love" analogy? The funniest humour can be when someone takes someone else's joke and twists it in ways they didn't expect. ;-)

GPL-ed algorithm? by Wilson_6500 · 2007-02-25 08:22 · Score: 2, Interesting

I've done a bit of scientific programming, but I'm not a computer-science-oriented person by any stretch. Thus, I'm a little confused by this. Under what license are things like Quicksort and Stupidsort (for another, very bad example) "released"? Why would a person feel it necessary to release an algorithm under any kind of license at all, in fact? No textbook would want to include GPL-ed code--wouldn't that make the entire book GPL by nature of the license?

Please correct me if I'm mistaken, but this seems like a fairly bad precedent to set.

Re:GPL-ed algorithm? by Helios1182 · 2007-02-25 08:33 · Score: 4, Informative

You can't copyright an algorithm, so there is no license to worry about. This is a well meaning but utterly clueless person who reimplemented a well known algorithm and then made a strange constant-factor instead of asymptotic analysis.
Re:GPL-ed algorithm? by Hootenanny · 2007-02-25 08:34 · Score: 1

As far as I understand, source code can be placed under the GPL, but an algorithm cannot be. An algorithm is just a concept - it can be patented of course, but that's aside the matter here. Unless I'm mistaken, you could write your own implementation of BitFast and use it for commercial purposes, closed-source projects, whatever you want as long as you don't use their source code.
Re:GPL-ed algorithm? by reebmmm · 2007-02-25 08:47 · Score: 1

Algorithms aren't protected... a particular implementation can be. If you create an "original work of authorship," and fix it in a "tangible medium of expression", you've got a copyright. Therefore, the need for a license. Assuming that the source code was sufficient to meet the copyright standards, if the "author" hadn't said anything, it would have been protected and use would have been unlicensed.

IAAL, but I didn't RTFA, so I won't tell you whether it did actually meet those requirements.
Re:GPL-ed algorithm? by drooling-dog · 2007-02-25 10:03 · Score: 1

No textbook would want to include GPL-ed code--wouldn't that make the entire book GPL by nature of the license? Not as long as you provide the source (which of course you would; it's the whole point) and don't impose any additional restrictions on the license. If you modify GPLed code, then you have to release the revised source if you distribute the binary.

That said, a lot of nonsensical FUD has been circulated regarding the requirements of the GPL by parties who feel their interests are threatened by it. As with everything, it's best to read it for yourself.
Re:GPL-ed algorithm? by Nasarius · 2007-02-25 10:28 · Score: 1

IANAL, but I do believe you could patent it (well not this one, since it's not novel). A particular implementation can of course be copyrighted.

--
LOAD "SIG",8,1
Re:GPL-ed algorithm? by Anonymous Coward · 2007-02-25 12:43 · Score: 0

No. The surrounding text (around the software) is not a part of the software as such, and can be copyrighted using normal means. The software (if its licensed under the GPL) can be shared freely/copied from book to machine, and then from machine to machine as long as the source code, and a copy of the license (or a message where you can get the code and license) are included with the software. It can all be in a README along with the software. The description of the software, captions, comments, etc. are the copyright of the author/publisher. A huge number of algorithms are covered in the 3 volumes of D. Knuth's "The Art of Computer Programming" (and volume 4 may be out within the next 10 years or before Duke Nukem Forever or the next stable version of Debian, although geologic time scales can't be accurately predicted), so suits over algorithms are likely to result in pain for the plaintiff.
Re:GPL-ed algorithm? by TerranFury · 2007-02-25 17:41 · Score: 1

Patented algorithms:

Example 1: Marching cubes.
(Ok, so it's the table that maps voxel combinations to triangles that's patented, not the recursive idea, but still...)

Example 2: MP3
(Is it still?)

Example 2.1: Basically everything implemented by VLC. ;-)

Did KD set this guy up for ridicule on purpose? by grimJester · 2007-02-25 08:37 · Score: 5, Informative

Ok, I'll start from his site:

- Programming Skills : Visual Basic ***Excellent***

Yes, that certainly is... excellent.

- Message : "Don't ever let school, stop you from learning!"

School could, learn you some grammar.

The algorithm is being released under the GPL ( General Public License ). The algorithm belongs to PhoenixBit and VirusFree but you may use/modify it freely.
*** DO NOT COPY ANYTHING FROM THIS PAGE TO ANY OTHER PAGE. IF YOU WANT SOMEONE TO READ THIS THEN LINK TO THIS PAGE ***

In addition to trying to apply copyright to an algorithm, doesn't a restriction on copying defeat the purpose of releasing something under the GPL? Or does text in all caps trum previous text not in all caps?

Feel free to add to this. If there are some clips of this guy with lightsabers, pleas post them.

Re:Did KD set this guy up for ridicule on purpose? by Brandybuck · 2007-02-25 09:02 · Score: 3, Informative

Geez, were you ever lucky! A ten ton joke just wooshed by inches over the top of your head! You could have been killed!

--
Don't blame me, I didn't vote for either of them!
Re:Did KD set this guy up for ridicule on purpose? by remahl · 2007-02-25 09:03 · Score: 2, Funny

I'll tell you what's ironic: People pointing out ironic things under the assumption they're not. That may make this message is ironic.
Re:Did KD set this guy up for ridicule on purpose? by grimJester · 2007-02-25 09:06 · Score: 2, Interesting

I think it's almost mandatory to have grammar or spelling errors in any grammar nazi post. I noticed the "pleas post" right after posting. As a newly found subset of Murphy's Law, someone should name it and become famous.
Re:Did KD set this guy up for ridicule on purpose? by jmac1492 · 2007-02-25 11:56 · Score: 1

As a newly found subset of Murphy's Law, someone should name it and become famous. Then release it under the GPL?

--
Jenny's got a new number! 09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
Re:Did KD set this guy up for ridicule on purpose? by digitig · 2007-02-25 13:20 · Score: 1

The algorithm is being released under the GPL ( General Public License ). The algorithm belongs to PhoenixBit and VirusFree but you may use/modify it freely.
*** DO NOT COPY ANYTHING FROM THIS PAGE TO ANY OTHER PAGE. IF YOU WANT SOMEONE TO READ THIS THEN LINK TO THIS PAGE ***

In addition to trying to apply copyright to an algorithm, doesn't a restriction on copying defeat the purpose of releasing something under the GPL? Or does text in all caps trum previous text not in all caps? In fairness, that's something he didn't get (entirely) wrong. The attempt at GPL applies to the algorithm, the shouty restriction applies to the page describing the algorithm. They're not the same thing.

--
Quidnam Latine loqui modo coepi?
Re:Did KD set this guy up for ridicule on purpose? by mrbobjoe · 2007-02-25 15:15 · Score: 1

As a newly found subset of Murphy's Law, someone should name it and become famous. Murfy's Law?
Re:Did KD set this guy up for ridicule on purpose? by Centurix · 2007-02-25 16:24 · Score: 1

And there it is. Marking the above post the point at where Godwin's law of Nazi analogies comes into effect.

--
Task Mangler
Re:Did KD set this guy up for ridicule on purpose? by mgiuca · 2007-02-25 19:30 · Score: 1

Note that he lists all of his "skills" as languages and programs. Languages != skills.

And was it intentional that his hobbies is an infinite list of the string "Computers"?
Hobbies : Loop
......................AddHobbie ("Computers");
................Do Until 1=2
(Is that VB, with a semicolon?)

So.. anyway, is it just me or has everything been taken off the site now? I thought it was a browser or slashdotting problem, but now there's a link to "Radix sort" on Wikipedia. Do you think they read all the slashdot posts, realised it was radix sort, then deleted everything and replaced it with a wikilink? Well I guess our job is done...
Re:Did KD set this guy up for ridicule on purpose? by Sirch · 2007-02-25 21:57 · Score: 1

I call it "Hawking's Law"...

--
We Build Beautiful Websites

The algorithm belongs to PhoenixBit and VirusFree by Anonymous Coward · 2007-02-25 08:40 · Score: 0

The algorithm belongs to PhoenixBit and VirusFree but you may use/modify it freely.

Huh? You can copyright the implementation you have (what allows you to put it under the GPL). That does nothing to control the algorithm... you would have to attempt to get a patent to do that.

solution in search of a problem by belmolis · 2007-02-25 08:44 · Score: 4, Insightful

This is a variant of RadixSort, which is well known to be faster than any comparison sort such as MergeSort. The problem with non-comparison sorts is that as such they are restricted to sorting items representable a unstructured bit fields, which means, essentially, integers. A large part of the time, the real problem in sorting is (a) extracting the fields that you want to use as keys (since it is not generally the case that you want to sort on the entire record) and (b) arranging for each pair of records to compare as you need them to, which involves both recognizing the internal structure of keys (consider the case of dates) and imposing suitable weights for the individual components. In other words, in many situations the bulk of the code and time are devoted to parsing and transformation of records. So long as you are not using a really bad algorithm, the time devoted to the sort itself is likely to be a small percentage of the total time.

For example, I have written a sort utility that differs from most others in its ability to parse the input into records and records into fields and in the transformations it can apply so as to use unusual sort orders and sort on things like dates, month names, and numbers in non-Western number systems. It was originally written for alphabetizing dictionaries of "exotic" languages. It is frequently the case that the time devoted to the actual sort is less than 1% of the run time.

In sum, non-comparison sorts have a niche but are of limited utility because they get their speed from making use of additional information that is only available for a limited set of datatypes. For the great majority of applications, only comparison sorts are flexible enough to be of use.

Re:solution in search of a problem by Anonymous Coward · 2007-02-25 09:17 · Score: 0

For example, I have written a sort utility that differs from most others in its ability to parse the input into records and records into fields and in the transformations it can apply so as to use unusual sort orders and sort on things like dates, month names, and numbers in non-Western number systems.

I do something very similar.

It's called a database.
Re:solution in search of a problem by Animats · 2007-02-25 09:18 · Score: 1

While radix sorts are limited, binned sorts, which are a superset of radix sorts, are more powerful. Binned sorts do work on a broad set of data types, but you have to use statistical techniques to dynamically pick a good distribution among the bins. SyncSort developed this technique in the 1960s, and had the first software patent, now long-expired. Mainframe batch systems have used such techniques for decades.
The general idea is that if you're sorting, say, text strings, you watch them go by for a while, accumulating statistics, then compute a set of bin division points which will divide them into bins of roughly equal size. The bins have to be watched, and if some get too big or too small, the bin division points have to be adjusted. It's a tree-rebalancing problem.
If your sort is too big for RAM, and you have to go out to disk (or, in early systems, tape) this is a huge win. It lets you beat O(N log N), both in theory and in practice. SyncSort times grow at worse than O(N), but better than O(N log N).
Few people bother for in-memory sorts, though.
Re:solution in search of a problem by belmolis · 2007-02-25 09:23 · Score: 1

I do something very similar. It's called a database.

No, it isn't a database. It doesn't support queries or allow for the insertion or modification of data. A true statement would be that the sort functions of database management and query programs are usually closer in flexibility to msort than are stand-alone sort utilities.
Re:solution in search of a problem by HBI · 2007-02-25 10:20 · Score: 1

You're right, but if a date is a Julian, then it's an...integer. I try to store all dates as Julians for that reason.

--
HBI's Law: Frequency of calling others Nazis is directly correlated with the likelihood of the accuser being Communist.
Re:solution in search of a problem by mstone · 2007-02-25 11:15 · Score: 1

Wow..you really need a copy of Transaction Processing by Jim Gray.

IIRC, it takes something like 10,000 operations to open a connection to a database server, another 1000 to parse the SQL, then however many are necessary to process the data itself. And that doesn't count the time losses for network latency for database servers that live on another machine, or process swapping for servers that live on the same machine.

A sorting algorithm written directly into your program will execute less code by an order of magnitude or so, and doesn't induce network/IPC latencies.

If you have a large dataset, you can amortize the startup latency of the database connection into the O(n log n) time spent cranking the data itself, but it takes a lot of data to make 10k-operations-plus-latencies statistically insignificant. For small-to-medium batches of data, even bubblesort is faster than using a database.
Re:solution in search of a problem by belmolis · 2007-02-25 17:26 · Score: 1

Interesting. Do you know of a publication of the algorithm? Google yields many hits for the company but apparently none for the algorithm.
Re:solution in search of a problem by Animats · 2007-02-25 18:59 · Score: 1

U.S. Patent #4,809,158 has a description of a more modern version of the SyncSort algorithm, with discussions of efficiency issues and comparison graphs.
This isn't the big deal it used to be, but back in the batch era, these algorithms knocked days off some big sort jobs.

Time for the "reinventthewheel" tag? by jfengel · 2007-02-25 08:45 · Score: 1

Ideally, I'd like a tag for the clump of "Computer science by people who haven't done the background reading" posts we get. I haven't seen "my algorithm compresses everything, even compressed files!" or "I've proved that P = NP!" posts in a while. But we should be prepared.

How about "badcomputerscience"?

Re:Time for the "reinventthewheel" tag? by Asztal_ · 2007-02-25 09:20 · Score: 1

mp3beatingcompression?
Re:Time for the "reinventthewheel" tag? by istartedi · 2007-02-25 11:48 · Score: 1

haven't seen "my algorithm compresses everything, even compressed files!

I've got one of those. It compresses all known files down to a string about the same size as the MD5 hash for the file. It works well on almost every popular file out there. Only trouble is, the download for my codec is roughly 3 to 5 petabytes. I'm working on that though.

--
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
Re:Time for the "reinventthewheel" tag? by qubezz · 2007-02-25 12:56 · Score: 1

How well does it uncompress these two 128 byte files?

d131dd02c5e6eec4693d9a0698aff95c 2fcab58712467eab4004583eb8fb7f89 55ad340609f4b302 83e488832571415a 085125e8f7cdc99fd91dbdf280373c5b d8823e3156348f5b ae6dacd436c919c6 dd53e2b487da03fd02396306d248cda0 e99f33420f577ee8 ce54b67080a80d1e c69821bcb6a8839396f9652b6ff72a70

d131dd02c5e6eec4693d9a0698aff95c 2fcab50712467eab4004583eb8fb7f89 55ad340609f4b302 83e4888325f1415a 085125e8f7cdc99fd91dbd7280373c5b d8823e3156348f5b ae6dacd436c919c6 dd53e23487da03fd02396306d248cda0 e99f33420f577ee8 ce54b67080280d1e c69821bcb6a8839396f965ab6ff72a70
(remove all the slashtrash spaces first..) Hint: The MD5 hash of both is 79054025255fb1a26e4bc422aef54eb4.

See http://www.mscs.dal.ca/~selinger/md5collision/

My CRC-8 compression algorithm uses the same technique and beats your compression by a factor of 0x10!

I am sure by Anonymous Coward · 2007-02-25 08:48 · Score: 0

I am sure Microsoft has patented it by now.

int main by Anonymous Coward · 2007-02-25 08:55 · Score: 0

in the source examples, why is main of type long?

Re:int main by Srdjant · 2007-02-25 09:17 · Score: 2, Interesting

His main BitFast code is in header files...
Messy and inconsistent indenting (check c/ and c++/ files for differences)

Difference between c/* and c++/* is that c++/ has cout instead of printf()... well done...

This stuff should go to worsethanfailure.com (was TheDailyWTF.com)

This guy by ronnybrendel · 2007-02-25 08:56 · Score: 1

Just killed himself I guess. Not to mention that this news is total trash, sadly. Being faster than O(n log n) with random data, is pretty hard

Re:This guy by Anonymous Coward · 2007-02-25 09:48 · Score: 0

No, it isn't. There are a plenty of O(n) sorting algorithms. They have restrictions but not on the randomness of the input. It's hard (actually impossible) to do better than O(n*log(n)) for a comparison sorting algorithm.

kdawson is an idiot by Anonymous Coward · 2007-02-25 08:57 · Score: 0

That's why he put it up. This is just some huckster trying to (and succeeding in) making money off of idiot Slashdot moderators and viewers.

His algorithm :::is::: optimal by shoemakc · 2007-02-25 09:01 · Score: 1

Then again, his algorithm for becoming a loser may truly be optimal...

Hobbies : Loop ......................AddHobbie ("Computers"); ................Do Until 1=2

-Chris

--
--an unbreakable toy is useful for breaking other toys--

Gentle readers.... by Anonymous Coward · 2007-02-25 09:04 · Score: 1

Kindly, polite encouragement of younger computer programmers is preferred to flaming them.

Shock horror, stop the presses by sholden · 2007-02-25 09:09 · Score: 2, Informative

Retard who thinks you can GPL an algorithm manages to reinvent an algorithm first implemented as a computer program in the 1950s, and used way before then outside of digital computers.

Also manages to not know about memset and write C code that assumes sizeof(long)==4 and puts non-inline code in header files. I don't dare look at the C++ code since there's a large probability it will cause permanent brain damage.

Re:Shock horror, stop the presses by Srdjant · 2007-02-25 09:22 · Score: 1

> Also manages to not know about memset and write C code that assumes sizeof(long)==4 and
> puts non-inline code in header files. I don't dare look at the C++ code since there's a
> large probability it will cause permanent brain damage.

On 32-bit machines, sizeof(long) is 4, though I think on 64-bit machines this is 8 (but I'm
not sure).

Please don't look at the C++ code..
Re:Shock horror, stop the presses by sholden · 2007-02-25 09:35 · Score: 1

On 32-bit machines, sizeof(long) is 4

Chapter and verse please. Where in the C Standard does is that stated? sizeof(long)>=4 is what I recall. Of course historically (and for compatibility with code written by people who don't care about portability) sizeof(long)==4 is the norm. sizeof(long)==8 is what makes sense for 32 bit machine. After all sizeof(int) will be 4, but due to those aforementioned programmers it never is...
Re:Shock horror, stop the presses by Srdjant · 2007-02-25 09:43 · Score: 1

AFAIK, K&R state that int can be 2bytes or 4bytes.
long is 4
long long is 8

Try this:

#include <stdio.h>

int main(){
printf("Size of int: %d\n", sizeof(int));
printf("Size of long: %d\n", sizeof(long));
printf("Size of long long: %d\n", sizeof(long long));
return 0;
}

On a pure 32-bit machine:

[srdjant@thor ~]$ ./sizes
Size of int: 4
Size of long: 4
Size of long long: 8

So you must be wrong in that regard.

I can't comment for the time being on the C standard saying that sizeof(long) >= 4.
Re:Shock horror, stop the presses by Srdjant · 2007-02-25 09:51 · Score: 1

I should add, that on 64bit machines, this changes.
On 64bit machines, long should be 64bit and thus sizeof(long) on
those machines should be 8.
Re:Shock horror, stop the presses by sholden · 2007-02-25 10:22 · Score: 1

I don't use K&R C anymore. I don't care what a particular compiler uses. I care what the standard says and hence what is portable. I don't have a recent standard, but the chances they changed this basic is stuff is infinitesimal:
5.2.4.2.1 Sizes of integer types

[#1] The values given below shall be replaced by constant
expressions suitable for use in #if preprocessing
directives. Moreover, except for CHAR_BIT and MB_LEN_MAX,
the following shall be replaced by expressions that have the
same type as would an expression that is an object of the
corresponding type converted according to the integer
promotions. Their implementation-defined values shall be
equal or greater in magnitude (absolute value) to those
shown, with the same sign. ...

-- minimum value for an object of type long int
LONG_MIN -2147483647 // -(2^31-1)

-- maximum value for an object of type long int
LONG_MAX +2147483647 // 2^31-1

-- maximum value for an object of type unsigned long int
ULONG_MAX 4294967295 // 2^32-1

Emphasis mine, which with 8 bit bytes, means sizeof(long)>=4.
Re:Shock horror, stop the presses by Anonymous Coward · 2007-02-25 10:26 · Score: 0

Nope. sizeof(long) is compiler-dependant, not architecture-dependant. It is, however, guaranteed to be at least 4 bytes.
Re:Shock horror, stop the presses by Srdjant · 2007-02-25 10:38 · Score: 1

Thanks for correcting me.

In which case, you are right about sizeof(long)>=4.

However I don't think many compilers define it longer than 4 bytes on
32bit machines.
Re:Shock horror, stop the presses by Magura · 2007-02-25 16:49 · Score: 1

I thought BefDC was proof of that brain damage years ago.
Re:Shock horror, stop the presses by T.E.D. · 2007-02-26 02:58 · Score: 1

Also manages to not know about memset and write C code that assumes sizeof(long)==4 and puts non-inline code in header files.

I've seen several people bag on his use of non-inline code in header files. Is that really considered bad practice? I ask because we have a developer here in who loves to do that. Personally, I dislike the practice, but he has a job teaching C++, so I don't feel like I have the cred to gainsay him.

wtf by mpoloks · 2007-02-25 09:09 · Score: 0

a new algorithm 10x faster...from two guys calling themselves VirusFree and PhoenixBit?

sigh

Re:wtf by Anonymous Coward · 2007-02-25 09:36 · Score: 0

Yes. We haves m4d skillzors. 10x faster than n00b quicksort.

Signed,
PhoenixBit

Correctness by Anonymous Coward · 2007-02-25 09:19 · Score: 0

Bit level comparison of floating point numbers won't work in general, and more specifically, it doesn't work in the code you posted.
Also, after translating your time functions for unix, your algorithm does not terminate unless compiler optimizations are turned on (gcc -o moo main.c -O3).

Barring the fact that floating point is broken, comparing your radix sort to another array-based sorting algorithm might be more productive. O(N log N) is the lower bound on non-parallel, and much of the battle is coming up with lower multiplicative constants, as you are trying to do. Again, your comparison

You state that you need a constant amount of memory, in your case this is sizeof(long) * 2 * 2^16 = 2^19 bytes, which is half a meg.

Testing the long version doesn't seem to get equivalently sorted lists either. For one thing you seem to be sorting in the opposite direction as mergesort.

Ignoring correctness, you only have a speedup of ~1.3 over qsort.

Where's the proof? by frostilicus2 · 2007-02-25 09:24 · Score: 2, Insightful

Perhaps coming from a math background, my priorities differ from those with a CS background, but...

Where's the proof?

Is it peer reviewed? Has it been published? Does it exist?
Although the lack of a proof does not make your algorithm is incorrect, I'd be suspicious. In fact, I would be unwilling to use any algorithm in my code without being sure that a good proof of its correctness exists somewhere.
Proofs are more than formalities, they're essential: without a good proof, you're just a crank selling snake oil.

--
Nothing sucks like a Vax, nothing blows like a PowerMac G4

Re:Where's the proof? by Anonymous Coward · 2007-02-25 17:27 · Score: 0

Perhaps coming from a math background, my priorities differ from those with a CS background

If computer scientists didn't care about proofs there wouldn't be any computer science in the first place. But indeed, people with a "math background" (which on Slashdot usually means something like "took two calculus courses and something about abstract algebra"), like you, do differ from CS types in that:

a) They wouldn't recognize radix sort if their lives depended on it.
b) They can't prove the correctness and complexity of a fairly trivial algorithm by themselves, but need it published in a peer-reviewed journal first. Presumably, so someone else can do it for them.
c) They think you need a formal and fancy full derivation, typeset in LaTeX and with 2" margins, before you decide on using any kind of algorithm, whereas people who actually program for a living invent and use algorithms everyday, without problems and without formalia.

But yeah, you go ahead and wait for that journal article about the reinvention of radix sort by a 15 year old kid from Cyprus, while the rest of us get back to work.

Buffer overflow by TheMoog · 2007-02-25 09:24 · Score: 2, Insightful

Unless I'm very much mistaken, there's a blatant buffer overflow in this code. In bitfast.h the "Tail" and "Head" arrays are defined as having 65535 (0xffff) entries. That's from 0-65534 inclusive.

The code in "BitFast" then uses the array as if it can access Head[0xffff] and Tail[0xffff] - which is one past the end of the arrays. I'd guess it works by coincidence on Win32.

Additionally, although there might be some reason I can't follow at the moment, in the float version only the top 15 bits of the values are seemingly used. This would seem to ignore the sign bit of the FP representation. Maybe there's another subtlety I'm missing, but I can't see it at the moment.

Given these basic misunderstandings of programming concepts I'm not surprised the authors hadn't known of the radix sort which many people here have linked as being very similar indeed.

--

GodboltBlog

Re:Buffer overflow by Anonymous Coward · 2007-02-25 18:28 · Score: 0

The memory allocation routines in C and C++ usually allocate more memory than needed. Frequently you can go one or more bytes over without a problem. It's not Win32 specific, it's an efficiency hack done by most OS and compiler designers. These off by one errors are frequently missed when the memory is allocated on the heap.

-ShadowRanger
Re:Buffer overflow by Anonymous Coward · 2007-02-25 19:32 · Score: 0

It doesn't work by co-incidence on WIN32. The malloc/new([]) allocate in 8 char units; the allocation size is therefore always aligned to 65536 long objects. It's still a bug, but it working isn't a co-incidence.

The c++ language police would of course say, that the code invokes UNDEFINED BEHAVIOR and sperm will fly out of Demon's noses, whatever.
Re:Buffer overflow by CoCoUmlat · 2007-02-25 21:56 · Score: 1

I'm not going to go into the intricacies of how the above comments differ from why this implemention really won't crash on the more common OSs, but I will say (not having looked at the code given it isn't on the linked page right now) that there definitely are cases where a crash could result due to his supposed buffer overflow -- albeit not at the actual time of the overflow.

If for instance you had a struct { long valueArray[65535]; long *valuePtr; } and then attempted to write into valueArray[65535], you would likely be writing into valuePtr, thus leading to a crash at a later point when it tries to derefence it. Even malloc debugging tools would not catch this particular error in action, but that's why gdb has watch points!

Oh yeah, and LP64 and ILP64 systems have 8 byte sized longs which goes against the "allocate in 8 byte units" assumption for why it doesn't crash (it tends to allocate ALIGNED to 8 bytes on some systems, and 16 on others, due to vector programming reasons).
Re:Buffer overflow by TheMoog · 2007-02-25 23:31 · Score: 1
Critically though, and as pointed out in some other posts:
- Writing off the end of an allocated piece of memory is underfined. Just because it might work now, doesn't mean it'll work later.
- More importantly, these were allocated off the stack, not using malloc/new. The stack is usually DWORD aligned. Thus accessing Head[0xffff] could well actually be accessing Tail[0].
Whatever, relying on undefined behaviour is relying on coincidence. In your example the malloc/new alignment is a coincidence of the particular heap of the particular version of the C runtime. There's nothing to say it won't change between VC6, VC7 and VC.NET (and actually, it did). Nor does it mean it will work on other platforms. It's just a coincidence that on the platform it was developed for, with the compiler and associated runtime it was developed for, it worked.
--
GodboltBlog

How many stupidities in one post? by Lord+Duran · 2007-02-25 09:26 · Score: 2, Funny

1. Loop ......................AddHobbie ("Computers"); ................Do Until 1=2 Since he says he's "**Excellent**" in Visual BASIC, one could ponder how he MIXED UP THE Do AND Loop STATEMENTS. 2. Don't ever let school, stop you from learning! He apparently never had... 3. 10 times faster than MergeSort? That's asymptotically, right? O(10nlogn). Call the press. (Doesn't matter that Radix Sort actually takes O(n), as long as it's 10X!) 4. BitFast start by creating two arrays of size 65535 ( pow(2,16) ) Anyone got a calculator? Somehow his numbers don't add up. Is it actually possible for a positive power of two to be odd? And...Since when do you use pow to calculate powers of two? 5. They are not the most error free,optimized,commented,beautifull code to have ever been written Gee...really?

Re:FROTHY PISS by VENONA · 2007-02-25 09:27 · Score: 5, Insightful

Explanation: pretty much bogus.

As for the quality of your post: let me guess--you also complain about slow, bloated software, right? The old, "Intel giveth, Microsoft taketh away," adage? Users have several orders of magnitude more compute and storage power than 'back in whatever day' yet personal computers seem little more responsive, etc.

Don't feel lonely. There's a large population of lugnuts like you, who, if they think of CS at all, largely carp about how some CS departments haven't become current technology de jour tradeschools. Some, unfortunately, have, but that's a whole different discussion, which has been seen on Slashdot time and again.

Algorithm research is important, as is having at least something of a grasp of algorithms. In *your* next programming exercise, since you seem to regard sort efficiency as 'esoteric', feel free to reinvent the bubble sort. Also, tout it as the Next Great Thing, and submit patches against all your favorite apps. That will get you your twenty seconds of fame, I assure you.

Sometimes I love Slashdot--but then I read a post from some random AC idiot like you: the proverbial lowest common denominator. Maybe you should restrain your efforts toward what you seem to regard as cool snarky posts, watch a thread (about which you plainly know nothing) develop, and maybe freaking *learn* something.

OTOH, maybe I'm just bitter right now, because I've just been doing a search through Google news on climate change, and I'm pretty much convinced that the last thing the human race needs right now is people like you.

--
What you do with a computer does not constitute the whole of computing.

Knuth-h by jackb_guppy · 2007-02-25 09:28 · Score: 1

Ah, the book of gods and the cure for insomnia. I have read that book 3, 4 times. The best I can master is 4 pages before falling asleep! It was a of great help when I was building an index method for Self Balancing Multiway Tree, with key compression and redundant data removal for the IBM S/34 and S/36.

Re:Knuth-h by mstone · 2007-02-25 10:37 · Score: 1

Book?

Last time I looked at my bookshelf, there were three: Vol 1 - Fundamental Algorithms, Vol 2 - Seminumerical Algorithms, and Vol 3 - Sorting and Saerching. There have also been three fascicles of Vol 4 - Combinatorial Algorithms released on the web, though the book itself hasn't gone to print yet, AFAIK.

Invented by George Papadopoulos by Digital+Vomit · 2007-02-25 09:31 · Score: 1

Who knew Webster's dad was such a great hacker?

--
Modern copyright is theft of culture from everyone and it retards the progress of the useful arts and sciences.

Please be kind to this kid by 140Mandak262Jamuna · 2007-02-25 09:37 · Score: 1

He is from Cyprus. Most likely English is not his first language. Please forgive his typos and grammar mistakes. Looks like he is hobbyist programmer who did not undergo any formal comp sci training. So he stumbled on to radix sort and thinks he has invented something new.

I am not saying he is Ramanujan or anything like that. But still every amateur deserves to be forgiven once.

--
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact

Re:Please be kind to this kid by ScrewMaster · 2007-02-25 10:17 · Score: 1

No, usually they get slapped down hard once or twice by somebody more knowledgeable, and the ones who are actually smart and capable of learning wise up and don't let it happen again. It's part of the lengthy process of going from "newbie" to "professional". Painful, to be sure.

--
The higher the technology, the sharper that two-edged sword.
Re:Please be kind to this kid by heroofhyr · 2007-02-25 10:37 · Score: 1

I too am a little surprised by how harsh some people here have been to this guy. A certain amount of joke-making is expected, and some of the jokes are funny, but other posts here are just hate-filled examples of pure douchebaggery that beg some sort of psychoanalysis. At least give him credit for thinking he had done something important and new and having the decency to (however illegitimately) try and apply the GPL to his idea instead of having little money symbols glaze his eyes over. Yeah, he hasn't done anything groundbreaking, but cut him some slack. As soon as he realises his algorithm isn't 100% original he's going to feel pretty stupid and embarrassed. Don't make it worse in some sort of pathetic attempt to make yourself feel marginally superior. This isn't gym class.

--
brandelf: invalid ELF type 'KEEBLER'

Re:The algorithm belongs to PhoenixBit and VirusFr by maxwell+demon · 2007-02-25 09:39 · Score: 2, Funny

Of course one would hope that the patent examiners at least have heared about radix sort, and therefore would reject that patent due to prior art ... but then, maybe he can avoid that by adding "with a computer" :-)

--
The Tao of math: The numbers you can count are not the real numbers.

Memory leaks... by fprog26 · 2007-02-25 09:41 · Score: 4, Interesting

Inside bitfast.h: long *Tail = new long[65535]; long *Head = new long[65535]; for(n = 0; n < 65535; n++){ Head[n] = 0; Tail[n] = 0; } Memory leak of 128KB for each time the function "Node* BitFast(Node* HeadNode,long run)" is called. MISSING: delete[] Tail; delete[] Head; Furthermore the following is faster or use memset: long Tail[65535] = {0}; long Head[65535] = {0}; Unless you are running this in DOS16, DOS32 with Pharlap or smaller 8051, 68000 processor, you do not need to use new[] or malloc() for this. In InsertToList, Node *tmpP = (Node*)malloc(sizeof(Node)); free(tmpP) missing inside FreeList()... no such function In main.c: Node *L1 = malloc(sizeof(Node)); Node *L2 = malloc(sizeof(Node)); MISSING: free(L1); free(L2); In main.cpp: Node *L1 = new Node; Node *L2 = new Node; Instead use (no need to use heap, use stack memory): Node L1 = {0}, L2 = {0}; Mixing the usage of std::cout and printf() is not recommanded. Use all of the former or the later, else you will have weird display on some run, unless you flush the output buffer. I suggest you use printf() all the way. cout << "Ok!!!!" << "\n\n" << "Merge Sort : " << tim1 << " ms\n" << "BitFast : " << tim2 << " ms\n\n"; becomes: printf("Ok!!!!\n\nMerge Sort : %ld ms\nBitFast : %ld ms\n\n", tim1, tim2); Furthermore, your implementation of link list, and calling Length(Node*) for every equal is highly inefficient, requires O(n). Store the link list length somewhere when inserting. EqualSplit takes O(1.5n) instead of O(0.5n) because of that. Something like: typdef struct LinkList { Node head; int length; } LinkList; As a result, MergeSort recursively call itself, which calls EqualSplit every turn. You should make your header file C friendly, and avoid mixing // with /* */, malloc and new[] No clue why this header is needed: #include <iso646.h> instead of not/and use use ! and && before: if(not p) return NULL; becomes: if(!p) return NULL; Finally, BitFast is not recursive; therefore, you should try to find an implementation of merge sort which is not recursive also, so you save on function calls, if possible. "However, iterative, non-recursive, implementations of merge sort, avoiding method call overhead, are not difficult to code." -- Wikipedia http://en.wikipedia.org/wiki/Merge_so rt Your function should be in a file.c or be inline.

Not possible... by gweihir · 2007-02-25 10:15 · Score: 1

Merge sort is optimal. A good implementation will be as fast as any other O(n log n) algorithm, maybe
withing a factor of two, as long as general purpose CPUs are used.

It seems this is an O(n) algorithm, of which there are two: Bucket-sort and Radix-sort. No other ones are possible. These are better in theory and practice, but have limited applicatbility. i.e. do not work for all input data. Merge-sort is an universal sorting algorithm.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

DING ... Algorithms are not copyrightable... by Anonymous Coward · 2007-02-25 10:20 · Score: 0

Hate to burst the bubble, but an algorithm is not copyrightable.

You can copyright a specific implementation, or your own specific description of the algorithm, but you cannot copyright the algorithm itself.

For example, the specific instructions (the list of ingredients, and the process of mixing them together) of how to bake your dear Aunt's double layer chocolate cake cannot be copyrighted. However if you want you can write a little booklet describing how your Aunt would make this cake for your birthday and how much it meant to you and incorporate the ingredients and the process as part of the story. Or you could collect all of your families recipes together and make a copyrighted cookbook of the collection.

Here is the backing comment from the US Copyright office:

Extent of Copyright Protection
Copyright protection extends to all the copyrightable expression embodied in the computer program. Copyright protection is not available for ideas, program logic, algorithms, systems, methods, concepts, or layouts.

http://www.copyright.gov/circs/circ61.html

If you want to protect the idea/algorithm itself, file a patent

Besides .. it's just a radix sort..

Toilet sort by tepples · 2007-02-25 10:22 · Score: 1

It's "I ran his sort on this post", but I like "I ran his sort on shit pots" better.

Re:Toilet sort by Anonymous Coward · 2007-02-25 17:29 · Score: 0

"Tina's rhino is shortstop"???

OK idea, poor implementation by renoX · 2007-02-25 10:31 · Score: 1

While the combination of radix sort and linked list is interesting, the implementation is quite poor: to contain all the bit in a 16bit word you need an array of 65536 bits not 65535!!

Minor points:
-It's using an ugly hack to split the 32bit values into the low or high 16b, why not just duplicate the function?
-Ptrs and Tail are initialised manually instead of using memset which can be more efficient.
-It would perhaps be worth using one combined array for Ptrs and Tail to improve cache locality (probably a very minor gain).

Knuth vol 3 page 173 by CustomDesigned · 2007-02-25 10:33 · Score: 3, Funny

The Art of Computer Programming, vol 3, Sorting and Searching, 1973, by Donald Knuth Page 173, Algorithm R - radix list sort. No credit is given. The algorithm has been well known and obvious since the first link list was invented. Hey! A good choice for a patent!

Re:Knuth vol 3 page 173 by mjjw · 2007-02-25 23:32 · Score: 1

Hmm obvious patents, but they wont get the chance to because I've just finished patenting 'breathing' or as my patent is filed 'An internal organ which inflates and deflates in order to facilitate the exchange of gases into and out of the human body' ... I can be very selective on who I license this patent out to.

--
If you aren't far left by the age of 18 you have no heart. If you aren't far right by 30 you have no brain.

Multiple bugs in the code, wrong measurements ! by wtarreau · 2007-02-25 10:34 · Score: 5, Interesting

Don't use this code !

First, there are multiple bugs in the code itself :

1) everywhere, it is assumed that 2^16 = 65535 and not 65536. Arrays are allocated for this size too, but are used with 65536 entries. (un)fortunately for him, the space is probably allocated anyway so the program does not crash on this.

2) a similar bug in the random number generator carefully avoids dangerous values, because he generates numbers between min and max by doing a modulo on max-min , which prevents max from being produced. This is not a
big deal anyway, just that it hides the bug.

3) he assumes everywhere that "long" is unsigned and "short int" too. This makes the code SegFault on Linux with negative array indexes. The fix is obvious, though.

And most of all : times are measured in optimal conditions. Indeed, this algorithm is
just a radix sort with a large radix (2^16). A radix algorithm is efficient when the number
of elements is high compared to the radix, because at every level, all radix entries have to
be scanned for existing values. In his example, he sorts 4 million elements. His code is faster
than the merge sort on 4 million entries, just as would be any radix sort. But on smaller sets,
it still takes the same time per level, because it has to scan all 65536 entries even if there
are only 100 numbers. The fact is that for less than 10000 values, the mergesort becomes faster in his own code.

When measuring algorithms complexity, we use the O(x) notation. The time per iteration is never
considered. When the time matters (as in his example), we should consider the time per iteration (T), and the total time (t) will follow :

t = T . complexity

In complexity, his algorithm is O(N) on 16 bits, but the time per iteration is very high. Other sorting algorithms are O(N.log(N)). So the total time is :

t1 = T1 . O(N) for his algorithm, and :
t2 = T2 . O(N.log(N)) for the mergesort.

But when T1 > T2.log(N), he looses, which is almost always because log(N) is small for 16 bits, and T2 too because operations are very simple. In his case, T1 is very high because it represents the time required to scan 65536 values.

It's amazing that he fell in such a beginner's trap. I suppose that he's still a student, which would explain the total lack of portability and newbie's bugs in the code.

Well tried anyway :-)

Willy

Re:Multiple bugs in the code, wrong measurements ! by owlstead · 2007-02-25 13:03 · Score: 1

"The fact is that for less than 10000 values, the mergesort becomes faster in his own code."

Ok, I agree on all your other points, but this is one always irritates me. Only someone very theoretical would mention that for 10000 (integer) values or less this sort is sub-optimal. Come on, how many ms would that sort really take? For less than 10000 values you could use a 286 and get it over relatively quickly.

I almost laughed all the way from my chair when I heard this in college. Same mistake. Nobody cares for values that low. It could be O(N^2) for all I care.
Re:Multiple bugs in the code, wrong measurements ! by slashdot.org · 2007-02-25 13:26 · Score: 2, Informative

Good points. But, we should give the guy some credit, he does state in the readme:

This examples are for Proof of Concept only.

They are not the most error free,optimized,commented,beautifull code
to have ever been written

Personally I would never write code like this, proof of concept or not, avoiding the portability problems for example would not have taken more time to write. But pressumably more experience. So probably the best thing to do is be constructive about it.

I'll give it a shot on some of the issues that I found:

- don't assume long = 32 bit and short int = 16 bit. Different compilers/platforms will have different sizes. If you need to proof of concept something and don't want to deal with the portability issues, do something like:

typedef unsigned long u32;
typedef unsigned short u16;

And then use u32/u16 throughout. That way, if someone wants to port the code, all they have to do is define the proper typedefs for their platform (I usually put that stuff in it's own header, like "to_port.h" or something).

Also, it makes it very obvious that you are relying on certain vars to be of a certain size.

- don't rely on how variables are stored in memory i.e. on what endian-ness the architecture is. If you take a 32 bit unsigned integer, the low 0-7 bits are not stored at the same memory address on different architecture. If you don't understand what I'm talking about, google for little-endian & big-endian. If you don't want to deal with special casing it, do something like #define LITTLE_ENDIAN 1 in your to_port.h, and then in the code do something like
#ifndef LITTLE_ENDIAN #error This code assumes little endian #endif
- this code:
long Ptrs[65535]; long Tail[65535];
stores 512KB on the stack. That's generally considered quite extravagent, to put it mildly.

- use memset instead of this code:
for(n=0;n<65535;n++){ Head[n]=0; Tail[n]=0; }
as an example, Microsoft's compiler can inline memset making it extremely fast, compared to a for loop. Btw. the write-through cache on the CPU would probably prefer it if this was broken up into two separate loops, so memory gets accessed linearly, instead of jumping back and forth.
Re:Multiple bugs in the code, wrong measurements ! by cachimaster · 2007-02-25 13:48 · Score: 0

Yes, don use this code.
They are exploitable bugs known as off by one overflows and Integer overflows .
I hope not, but these bugs may be there on purpose.
Re:Multiple bugs in the code, wrong measurements ! by blueskies · 2007-02-25 14:40 · Score: 1

I haven't followed the actual code, but setting pointers to 0 is actually wrong (if right for the specific architecture then it is semantically wrong). Shouldn't they be NULL instead? I haven't coded C in a long time, but I remember reading the comp.lang.c faq, and they warn against using memset to zero data structures because not all architectures use a 0 for Null.

Personally, I think you might never get bitten by the bug, but it makes good programming practice to use Null instead.
Re:Multiple bugs in the code, wrong measurements ! by atrus · 2007-02-25 16:08 · Score: 1

Architectures don't know about NULL, compilers do. It is bad form to set a pointer to an integer (0) though. NULL is more meaningful (when reading code, the computer doesn't give a damn and will happily try following your NULL or 0 pointer ;)).
Re:Multiple bugs in the code, wrong measurements ! by Anonymous Coward · 2007-02-25 17:22 · Score: 0

According to the standard, the value of a null pointer is 0. NULL is a commonly defined macro which evaluates to something like (void*)0. Strange, but true (I never learned this either until recently).

Personally, though, I'd agree that I find NULL much more readable.
Re:Multiple bugs in the code, wrong measurements ! by wtarreau · 2007-02-25 18:20 · Score: 2, Insightful

"The fact is that for less than 10000 values, the mergesort becomes faster in his own code."

Ok, I agree on all your other points, but this is one always irritates me. Only someone very theoretical would mention that for 10000 (integer) values or less this sort is sub-optimal. Come on, how many ms would that sort really take? For less than 10000 values you could use a 286 and get it over relatively quickly.

I almost laughed all the way from my chair when I heard this in college. Same mistake. Nobody cares for values that low. It could be O(N^2) for all I care.

Are you kidding ???

Obviously you don't know what sort algorithms are use for ! Any real time system or multi-task
subsystem (application or kernel) needs a scheduler to very quickly sort timers in order to wake
up tasks at the right moment without scanning the whole list. For instance, in a proxy like
Squid, you would need something like 10000 struct timeval to handle 10000
connections. If the algorithm takes milliseconds to sort 10000 values, your proxy will be
the slowest on earth. Worse, if it takes as much time when you have only 10 or 100 connections,
it is a real shame. Sorting 10000 values should take microseconds in such applications.

The same is true for kernel code. Did you ever wonder how the socket timeouts are handled ?
When you have to update your timeouts at every packet, you need something very fast,
clearly not an O(n^2) algorithm, nor bitfast

Willy
Re:Multiple bugs in the code, wrong measurements ! by slashdot.org · 2007-02-25 21:37 · Score: 1

I haven't followed the actual code, but setting pointers to 0 is actually wrong (if right for the specific architecture then it is semantically wrong). Shouldn't they be NULL instead? I haven't coded C in a long time, but I remember reading the comp.lang.c faq, and they warn against using memset to zero data structures because not all architectures use a 0 for Null.

Perhaps,- that's something that could be argued about. However, the case in point has nothing to do with pointers. He named the variable Ptrs, but the matter of fact is that it's declared as an array of longs, not an array of "long *". So using memset is fine. But perhaps there's something underlying wrong right there in the code; maybe it should have been actual pointers, not longs to begin with.

With regards to NULL, I'd like to have that argument. I think it's very hard to come up with a reasonable hmmmm, well, reason, why NULL should not basically be 0. I never use NULL, because it's a define. It doesn't make sense to me, is it defined as an integer or as a float, or a pointer? (I know the answer, but I'm saying that the basic concept is wrong). AFAIK it's not part of the standard. So if I were to initialize a pointer I'd use something like:
char *ptr = (char *)0;

That should be fine. Although one interesting point is that I'm not sure that a 0 pointer is technically defined as 'bad'. That is to say, I can imagine a platform where a pointer points to memory address 0, and that's perfectly fine. Not any platform that I'm aware off (except for special cases), but still. Could take a long night of stiff drinks before you'd convince me to use NULL ;-)
Re:Multiple bugs in the code, wrong measurements ! by EnglishTim · 2007-02-26 00:33 · Score: 1

You're assuming that:

a) The operation will be done on a modern PC rather than a less powerful, embedded platform.
b) The operation will only have to be done once.
c) The operation will only have to appear quick to a human.

Take, for example a game being written on a Sony PSP. You've got a 222Mhz processor, and to keep a steady 30fps, you need to get all your processing for a frame over in 33ms. Perhaps you've got some to sort some data once per frame for each of the fifty AI agents in the scene, and you're only prepared to spend 1ms / frame to do it. Suddenly you've only got 44,000 cycles to do the sort in. The choice of algorithm (even if each list you sort is pretty small) is going to be very important.

There are many other situations where you need to sort even quite a short list in a very short amount of time, normally because you will have to sort that list many times a second.

I nominate this... by Anonymous Coward · 2007-02-25 10:40 · Score: 0

I nominate this slashdot post to the Worst Slashdot Post for 2007. It's going to be hard to topple.

Re:I nominate this... by DrMindWarp · 2007-02-25 10:59 · Score: 3, Funny

It's only March. There'll be an inventor of the internal combustion engine along shortly.
Re:I nominate this... by larry+bagina · 2007-02-25 14:18 · Score: 1

just wait until it gets duped in a couple weeks.

--
Do you even lift?
These aren't the 'roids you're looking for.

And another one, if /. would let me... by drooling-dog · 2007-02-25 10:47 · Score: 1

Tried to post some lovely code here, but was refused by the "Lameness filter", which said I have too many "junk characters". Well, of course I do - it's C++ code! Lots of ampersands and such, which apparently it won't accept even embedded in ecode or blockquote tags.

Anyone have any advice?

Re:And another one, if /. would let me... by Anonymous Coward · 2007-02-25 16:17 · Score: 0

The normal thing is to dump some random text at the end of the post.
Re:And another one, if /. would let me... by Nazlfrag · 2007-02-25 16:52 · Score: 1

You could post dodgy pseudocode. As a test..

linearRadixSort(foo[],length,min,max) { size=(max-min)+1 temp[size] for i=0 to size-1 (temp[i]=0) for i=0 to length-1 (temp[foo[i]-min]++) k=0 for i=0 to size-1 ( ..for j=0 to temp[i] ( ....foo[k++]=i+min ..) ) }

Pick the Knuth by femto · 2007-02-25 10:52 · Score: 1

Pick the Knuth: both have a round face, thin hair and no teeth!

image 1

image 2

Be nice by l3mst0r · 2007-02-25 11:01 · Score: 2, Informative

I invented Radix sort once, too. Who hasn't? Let him have his 15 minutes of fame.

Re:The algorithm belongs to PhoenixBit and VirusFr by Anonymous Coward · 2007-02-25 11:44 · Score: 0

if this new sorting algorithm worked with arbitrary data, it could be applied by the patent office to find (10x faster then merge sort) that the algorithm is not new.

I'm not really sure, but. by Goaway · 2007-02-25 12:11 · Score: 3, Funny

Maybe this guy has re-invented radix sort? I can't really tell! I wish somebody would post and tell me!

Did you see the "hacker" section? by Anonymous Coward · 2007-02-25 12:18 · Score: 0

I think every thing's been said about the radix search on a SORTED linked list. I think his benchmark excludes his sort making it faster. Plus if I wanted a sorted list I'd use an data structure that inserts the value, reducing the sort overhead, I know the trade off and it's debatable.

They should rename it to lamers script kiddie intro.

For his buffer over flow, he uses a classic sled technique without even knowing what it is. He calls it a buffer. You don't even need to do that just get the address, offset and insert.
also fails to mention that if it's on the heap most o/s's won't execute from there.
Sorry if there are any typos it's my Birthday and I'm drunk!
Plus I'm annoyed I was excited to check out a new algorithm and what a disappointment.

And my validation word is warrants that's scary! I'm strictly white hat I swear!!!!

Re:Did you see the "hacker" section? by Anonymous Coward · 2007-02-25 12:33 · Score: 0

I totally messed up my joke in the first line, I was trying be funny. Oh well I'm going to sober up for awhile.
I love the flag reinvent the wheel!!

A couple of non-obvious notes about radix sort ... by RallyDriver · 2007-02-25 12:20 · Score: 2, Insightful

1. Although it's technically O(n) compared to QuickSort's O(n log n), consider the fact that

a. the number of passes required by radix sort is proportional to the key size, and that
b. if the keys are primarily unique, then the key length will be O(log n) .... thus radix sort is actually O(n log n) when applied to practical cases, hence why everyone uses quicksort or mergesort instead.

2. One cute feature of radix sort for very large data sets is that you only need linear access to the data, so it can be done with non-random-access external storage. The one practical implementation I've ever heard of used four 9-track tape drives.

However, considering use in main memory, both radix sort and quicksort make efficient use of a D-cache that is 2-way set-associative or better, and both hit the whole data set once per pass -> if it swaps, it'll swap a lot :-)

I've got an even better algorithm by Yartrebo · 2007-02-25 12:48 · Score: 1

And here's a sort that can be up to infinity times faster than mergesort:

void (linked_list * list)
{
if (!list->sorted)
{
qsort(...);
list->sorted = TRUE;
}
}

This algorithm has a best case of O(1), which is better than mergesort (best case is log(n)). Somehow the 'up to' part in 'up to 10x faster' doesn't instill much confidence.

Re:I've got an even better algorithm by DarkDust · 2007-02-25 20:24 · Score: 1

You better buy yourself a copy of Donald Knuth's The Art Of Computer Programming, part 3 (sorting and searching). Quicksort may be OK in some cases, but most of the time there's a better algorithm. Also the best case of quicksort is O(n log(n)), not O(1).
Re:I've got an even better algorithm by francium+de+neobie · 2007-02-25 22:52 · Score: 1

Load up an a 1GB array and sort it with your code, tell me, is it O(1) anymore?

I hate to say it but at least the I-reinvented-radix-sort kid did some experiments...
Re:I've got an even better algorithm by Yartrebo · 2007-02-26 03:38 · Score: 1

Read my code carefully, and remember that I'm comparing best cases, not average or worst cases. Since the words 'up to' were used in the original 'up to 10x' claim, it's my choice to make.
Re:I've got an even better algorithm by Yartrebo · 2007-02-26 03:41 · Score: 1

My code doesn't always invoke quicksort. Since I'm only concerned about best case performance (to make my 'up to' claim sound better), I wouldn't mind using a bubble sort for the other cases as it wouldn't change it a bit. Quicksort is part of the standard C libraries, so that's why I used it.

Call it "Please click the ads pls..." by Anonymous Coward · 2007-02-25 13:04 · Score: 0

I did appreciate the number of google ads (plus the inconspicuous donation button) along with the "License" that said 'don't copy this text, rather link to the page'. The sort isn't earning him the money...showing up on /. is.

You seem to have some emotional problems. by Anonymous Coward · 2007-02-25 13:27 · Score: 0

First of all, he deserves to be made fun of. If you think you have discovered something new and amazing, how about taking some time out to find out if you actually did or not before making nonsense claims? And nobody is saying he's a bad person for this, just that its stupid and shouldn't have been posted. It reflects more on the /. editor who was moronic enough to front page this than it does on the guy who wrote the (absolutely terrible) code. (kdawson, this is why clueless fucktards hide out in the games section).

And finally, trying to GPL it (as stupid as that is) is very much a bad thing. Its equally bad as trying to cash in on it. Either way you are restricting the use to a select group of people, wether those are people willing to pay, or people who are GNU/communists doesn't make any difference. If you want to make something free, then make it free, for everyone, for anything, period. Otherwise don't try to pretend you are doing something altruistic, when you are really doing something selfish.

Re:You seem to have some emotional problems. by Anonymous Coward · 2007-02-25 18:03 · Score: 0

You seem to have some emotional problems.
The irony of someone like you, clearly at most one fuse away from going on a Columbine-style rampage, responding in that way to someone as mature and nuanced as the parent poster makes me suspect I'm being trolled. If so, hats off to you. Otherwise, take a look in the mirror. Try not to smash it.
First of all, he deserves to be made fun of. If you think you have discovered something new and amazing, how about taking some time out to find out if you actually did or not before making nonsense claims? And nobody is saying he's a bad person for this, just that its stupid and shouldn't have been posted.
There are post ridiculing his code, his CV, his nickname, his hobbies, his spelling, and just about anything else you can find on the linked page. I don't know what good could possibly come out of that. He deserves to have pointed out to him that his algorithm isn't new, but that's about it. Even the sloppy code would have been forgiven if his algorithm was indeed new.
It reflects more on the /. editor who was moronic enough to front page this than it does on the guy who wrote the (absolutely terrible) code.
I agree. He should indeed be criticized.
(kdawson, this is why clueless fucktards hide out in the games section).
...but not like that. You're not the one with emotional problems? Right.
And finally, trying to GPL it (as stupid as that is) is very much a bad thing. Its equally bad as trying to cash in on it. Either way you are restricting the use to a select group of people, wether those are people willing to pay, or people who are GNU/communists doesn't make any difference. If you want to make something free, then make it free, for everyone, for anything, period. Otherwise don't try to pretend you are doing something altruistic, when you are really doing something selfish.
I'm not sure how doing something which means you're abandoning all control of your product (regardless if it means only people with certain preferences will be able to benefit from that) could reasonable be described as "selfish". But sure, you're entitled to your incoherent and hateful attitude towards the GPL. However, that's a discussion for another day. If that's how you feel about the GPL, complain to RMS, not to some clueless Greek guy who probably just used it for the first time in his life.

Re:10x? by Anonymous Coward · 2007-02-25 13:32 · Score: 0

RAIDICKS!

Re:sh1t by Anonymous Coward · 2007-02-25 14:15 · Score: 0

For once, goatse is on topic!!!

Mere Aggregation by Makoss · 2007-02-25 15:11 · Score: 1

A textbook could contain a work covered by copyright, and licensed under the Gnu GPL without requiring the entire book to be released under GPL. The relevant portion of the GPL is at the end of Section 2:

In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License.

Mere Aggregation is, in my opinion, the most overlooked aspect of the GPL. It gives developers the freedom to work with GPL code, without requiring that everything they touch then be GPL. Thus allowing GPL code to work its way into a closed system, one component at a time, and may the best code win. In this manner GPL projects can then attract contributers and maintainers from a much larger developer pool, thereby increasing the overall robustness of the community.

This is the "real" viral nature of the GPL, not that FUDish crud about 'if anything is GPL then everything is required to be GPL'.

(Please note, I am not talking source code level components, but rather independent pieces of software working in concert.)

--
Building a better backup.
Zettabyte Storage

Ripping apart the source by jinxidoru · 2007-02-25 15:27 · Score: 4, Informative

The previous few posts caused me to also open up the code. Wow! What a ride! I decided that I wanted to add a few things I found as well.

I loved that he referred to his use of the short pointer as a "cool pointer hack to avoid shifts and ifs!!!" To begin with, if we have to call that a "cool hack" then the requirements for coolness have definitely dropped a bit. I am also confused as to how this helps avoid shifts and ifs. As far as I can tell, it avoids one shift. I'd love to see the code without the "cool hack" because I'm intrigued as to how he uses an if statement to remove the higher bits.

As stated before, I am also fond of his use of an array of size 65535 rather than 65536. I surprised he didn't run into any segmentation faults. I know that I can give him a test case that will seg fault pretty easily.

I also question whether he understands the purpose of a header file, as the entire source for his BitSort is contained within a header file. I guess it would sort of make sense if the function was declared inline, but it isn't.

Another fun element is how non-modular his code is. Everything is hard-coded, even though it would have been easy to declare a few constants or even make the function a template.

Lastly though, why the crap did this article get through the editors. It is, as has been stated over and over, a radix sort and nothing different. I know that he claims it is different because it is in-place, but that is such an obvious simple change that it does not warrant a whole new algorithm. Regardless of the content, the article is so poorly written with some of the worst grammar I have read in quite some time. Slashdot really needs to improve its article standards.

Re:Ripping apart the source by tholomyes · 2007-02-25 21:27 · Score: 1

Well, now the page just has a link to the Wikipedia page on radix sort, which is kind of a shame. thedailywtf.com-worthy code is always fun, so thanks for the highlights!

"At no point in your rambling, incoherent response were you even close to anything that could be considered a rational thought. Everyone in this room is now dumber for having listened to it. I award you no points, and may God have mercy on your soul."

--
When did the future switch from being a promise to a threat? -C. Palahniuk
Re:Ripping apart the source by Anonymous Coward · 2007-02-25 23:56 · Score: 0

As stated before, I am also fond of his use of an array of size 65535 rather than 65536. I surprised he didn't run into any segmentation faults. I know that I can give him a test case that will seg fault pretty easily.

I doubt that. [65535] will almost certainly point inside a valid page. The variable declared after (or before) that array might well get trashed, but it won't segfault.
Re:Ripping apart the source by poot_rootbeer · 2007-02-26 03:14 · Score: 1

I'd love to see the code without the "cool hack" because I'm intrigued as to how he uses an if statement to remove the higher bits.

I retrieved an earlier version from his source control:
if (val > 2147483647) { val = val - 1073741824; goto done; if (val > 1073741824) { val = val - 536870912; goto done; if (val > 536870912) { ... } } } done:

Re:A couple of non-obvious notes about radix sort by adrianmonk · 2007-02-25 15:31 · Score: 1

One cute feature of radix sort for very large data sets is that you only need linear access to the data, so it can be done with non-random-access external storage. The one practical implementation I've ever heard of used four 9-track tape drives.

You can do that with merge sort as well, right? First you write n/2 unsorted records to tape A and the other n/2 unsorted records to tape B. Now you say each of those tapes has n/2 sorted lists of length 1. Then, you read both A and B at once, and you merge the first two sorted lists on A and B and write them to tape C. You take the second two sorted lists on A and B, merge them, and write them to tape D. Continue in this manner, alternating your writes between C and D, until you run out of data on tapes A and B.

Now, tapes C and D each contain a sequence of sorted lists, but each one is twice as long as the lists on tapes A and B. So you repeat the step and read from C and D and write to A and B. Every time you do this, the sorted lists get twice as long and there are half as many of them, until eventually you have one list with all the sorted elements.

For extra added efficiency, take the original input and sort it using whatever sort you choose (including mergesort) in RAM before you do the initial write to tapes A and B. This reduces the number of passes to log2(data size / ram size) instead of log2(data size).

Got to consider all the metrics by fireboy1919 · 2007-02-25 15:51 · Score: 1

The best best case sort you can get is n. An optimal algorithm would always hit that.

What it has is the optimal worst case, which is not the same thing at all.

When picking a sorting algorithm, you have to consider the best, worse, and average cases, and specifically how it deals with the kind of dataset you have.

Mergesort's best, worst, and average cases are all the same - nLog(n)
It's sort of like a compromise. It's not really fast at any particular case, but, then again, it's never bad.

If your data is perfectly random, or you know nothing about it, then merge sort is your algorithm.

However, on semi-sorted data, quicksort performs better or worse than merge sort. Better if it's almost sorted in the correct order, (O(n) if it is sorted in the right order), or worse if it's sorted mostly in reverse order O(n^2) if entirely in reverse order.

Most data that is created by humans is partially sorted (i.e. in human data entry cases)...so there are many places that quicksort is usually a better choice. However, given any unknown set sample, there is a much higher likelihood that the set will be mostly random - meaning on average when considering all possible sets, mergesort performs better.

These are the two extremes in the sorting world - quicksort has best best case, but the worst worst case, while merge sort has the best worst case, and the worst best case.

All other sorting algorithms that are considered efficient fall between these two extremes and take advantage of some piece of knowledge about the data (keep in mind that sorting algorithms in general *don't* distinguish between in-place and not in-place. This is almost universally a trivial detail that doesn't effect the order of the algorithm) that lets you do better than O(nLog(n)) on that specific kind of data.

It always seems strange to me that people learn what these two algorithms are without learning how they define the boundaries of the solution space. If you don't know what they're for then what's the point of learning more than one at all then?

--
Mod me down and I will become more powerful than you can possibly imagine!

He is a beggar by Anonymous Coward · 2007-02-25 16:19 · Score: 0

A$$hol is a beggar. Has removed the link and inserted Make Donations.

Quoting Knuth... by Ellen+Spertus · 2007-02-25 18:18 · Score: 2, Funny

cartoon about Knuth

Re:Quoting Knuth... by Anonymous Coward · 2007-02-26 15:31 · Score: 0

This was a gag from "AL TV" years ago (maybe the joke is even older).

Weird Al is doing the news (IIRC) and says "A spokesperson for the Beatles was quoted as saying, 'What? Where did you get this number? Leave me alone. Just leave me alone.'"

Besides - this Knuth cartoon is much funnier - http://knking.com/recbooks/index.html

More of VirusFree's Offerings: by the_mushroom_king · 2007-02-25 18:18 · Score: 0

Somehow ironic:

Buffer Overflow in Action Tutorial - Part 1

Spoiled by Database by Tablizer · 2007-02-25 19:07 · Score: 1

Ever since I got out of school, I almost never had to worry about sorting algorithms. I let the database do the sorting almost all the time (whether desktop DB's like FoxPro and MS-Access, or Oracle). Perhaps it is important in embedded software or systems software, but for business applications it seems most let the database do it, and we forget all those Comp-Sci sorting algorithms.

--
Table-ized A.I.

There should be an option for... by jamesmacaulay · 2007-02-25 19:08 · Score: 1

+1 Venomous

Re:A couple of non-obvious notes about radix sort by tajribah · 2007-02-25 20:50 · Score: 1

Ad b: This is not true. If you assume that the keys grow polynomially with n, let's say n^3, then just let the number of buckets in the radix-sort grow linearly (i.e., use radix n instead of some fixed radix). Then the number of passes will be equal to the degree of the polynomial, so in the cubic case three passes will be sufficient, leading to a linear-time algorithm again.

Radix-sort stops being usable if the integers grow superpolynomially, but then you can use a bunch of other techniques (see papers on sorting integers by Mikkel Thorup), leading to complexities on the order of n*log log n or even better.

Oops, reposting by rbarreira · 2007-02-25 21:33 · Score: 1

Oops, my previous reply had a misspelling in the blockquote tag!

Linear improvements are completely uninteresting. Take algorithm X in Perl, then a version in optimized assembly, and there you go, a 10X improvement without having done anything groundbreaking.

That's the kind of thinking which leads to the current kind of bloated and slow programs. Of course a better algorithm is always the first step, but programmers should always try to optimize their bottleneck algorithms. Would you like it if your software was 10 times slower? What about 100 times slower?

In my opinion, it is ridiculous to say that linear improvements are irrelevant. If we believe the current predictions about CPUs (number of cores increasing as opposed to GHz), this is even more true, since the old excuse of "the hardware will get better in 6 months" will stop being true.

--

The AACS key is NOT 0xF606EEFD628B1CA427BEA93A9CA9773F

Re:Oops, reposting by DaleGlass · 2007-02-25 22:29 · Score: 1

That's the kind of thinking which leads to the current kind of bloated and slow programs. Of course a better algorithm is always the first step, but programmers should always try to optimize their bottleneck algorithms. Would you like it if your software was 10 times slower? What about 100 times slower?

Completely wrong when talking about a sorting algorithm. A 10X improvement in framerate is indeed a really impressive one. But when sorting, a 10X improvement compared to what? It's useless to talk about "10X faster than merge sort". A sort algorithm's performance depends on its complexity. Bubble sort is O(n^2) worst case, but for 3 elements is probably faster than any other sort out there just because it's so simple.

But try to sort a few million elements, and the situation dramatically changes: bubble sort will never catch up. It doesn't matter that you make the code 10X faster, for large enough amounts of data the performance difference between the most optimized bubble sort and the crappiest merge sort will be dramatic.
Re:Oops, reposting by rbarreira · 2007-02-26 00:22 · Score: 1

That's precisely why I said that improving the algorithms is more important than optimizing an implementation. No need to preach to the choir here, we all know that complexity is very important; that's not what I was attacking about your post.

--

The AACS key is NOT 0xF606EEFD628B1CA427BEA93A9CA9773F

What a shameless self-promotion. by master_p · 2007-02-25 22:03 · Score: 1

Slashdot editors should have rejected this story.

Re:What a shameless self-promotion. by Anonymous Coward · 2007-02-26 02:33 · Score: 0

Slashdot editors should have rejected this story.

Look at it more positively. By spotlighting this programmer (and perhaps along, his team), the World will (hopefully) remember not to give these guys any serious consideration at a later time.

Re:FROTHY PISS by Brad+Eleven · 2007-02-25 23:12 · Score: 0, Flamebait

Now, now, VENONA. You're simply experiencing the #1 tradeoff in having above-average intelligence: More of your fellow human beings are simply going appear stupid to you.

And thank you for having expressed my point of view entirely. The first thing the snap-judgment part of my mind presented me when reading the AC post was, "Probably drives a Hummer, proudly."

--
"Press to test."
(click)
"Release to detonate."

10 times faster? by XSpud · 2007-02-26 00:11 · Score: 1

So mergesort's O(n log n) is replaced by O(n/10 log n). That's impressive!

Cool down slashdotters! by junglee_iitk · 2007-02-26 00:27 · Score: 1

The guy has replaced the page with a link to Radix sort on wikipedia.

Gym Class by Anonymous Coward · 2007-02-26 00:40 · Score: 0

Somebody said that this isn't a gym class and this guy was treated a bit harsh. Apparently People replying to this have been seen the dark side of the gym class, and here is their chance of giving their revenge on the newbie :P Nice move from the poster to put this guy infront of the firing squad. Now he has been called retard, his CV and everything he tried to do was shot down. Welcome, newbie, to the world of mature professionals :) Critisism is in place, but calling him a retard is a bit too much. Newbie perhaps, but retard? Ignorant or stupid? Well, hopefully this has made the world a better place. Mr. Virusfree will atleast think twice before publishing anything in fear of getting into slashdot, where newbies get what they deserve. I don't think this guy deserved all this. Is the function of /. to kill newbies? Hows about starting a week of newbie hunting, and searching the web for sites made by kids doing silly things with computers and introducing them to the professional firingsquad and hopefully make these 'retards' to stop touching computers ever, since they apparently are too stupid to be allowed to use them. See how many websites will Slashdot be able to take down by public ridicule campaigns? Revenge the treatment you got in gym classes, theres always somebody weaker than you to smite!

Slashdot headings. by drolli · 2007-02-26 01:01 · Score: 1

In the winter it is much colder than in the night.

Re: All physical computers are O(1) by Lobachevsky · 2007-02-26 03:59 · Score: 1

As long as a turing machine uses finite memory of length N, there exists a DFA of N^2 length that accomplishes the same in O(1). Essentially, your stick of memory is a finite number (X). Your algorithm, whatever algorithm it is, deterministically translates X into a new finite number (Y). I can avoid running the algorithm by having a static array that maps X to Y. To turn a 16-bit computer into an O(1) DFA, I need 2^32 bits of memory (a 65536-element array where each value is a 65536-bit string).

Anytime O(...) notation is used, it assumes arbitrary length input (a turing machine is never assumed to work on a finite tape, because it would so obviously reduces to a DFA otherwise).

Furthermore, optimal sorting is rarely expressed as O(N lg N), it is O(N lg k) where k is the # of unique elements. Because most textbooks want to use single-variable Big-O notation, they stick with O(N lg N), but more serious analysis uses multi-variable complexity. k is only pseudo-dependent on N, in that it has to obey the relationship k = N. In the worst case, k = N, and hence sorting is N lg N when expressed in terms of a single-variable. If we're serious about sorting it is O(N lg k). When k is finite, it reduces to merely O(N).

Links to a page of clickthrus? by kfsone · 2007-02-26 04:31 · Score: 1

Seems like someone wanted to make some clickthru money from /. ?

--
-- A change is as good as a reboot.

A little experiment by ge · 2007-02-26 08:25 · Score: 1

I've written both an implementation of Radix sort and Merge sort in C using linked lists. With integer keys Radix sort is indeed faster by about a factor of 2 on large lists (10^7 keys). The stack management for the non-recursive Merge sort take more time than the very simple operations in the Radix sort. The Radix sort is fairly simple: typedef uint32_t KEY; enum { KEYBITS = 32, STEPBITS = 11, }; enum { NUMBUCKETS = 1 << STEPBITS, }; #define MASK ((KEY)(NUMBUCKETS-1)) typedef struct node NODE; struct node { KEY key; NODE *next; }; NODE * sort(NODE *a) { NODE *bucket[NUMBUCKETS+1]; NODE **tail[NUMBUCKETS]; unsigned shift, i; bucket[NUMBUCKETS] = NULL; for(shift = 0; shift < KEYBITS; shift += STEPBITS){ for(i = 0; i < NUMBUCKETS; i++) tail[i] = &bucket[i]; while(a != NULL){ unsigned index = (unsigned)((a->key >> shift) & MASK); *(tail[index]) = a; tail[index] = &a->next; a = a->next; } for(i = NUMBUCKETS; i-- > 0;) *(tail[i]) = bucket[i+1]; a = bucket[0]; } return a; }

A note on quicksort by gr8_phk · 2007-02-26 13:55 · Score: 1

Quicksort has a typical runtime of O(nLog(n)). It has a worst-case time of O(n^2). For this reason I have a preference for heapsort which has a worst case of O(nLog(n)) even though it is typically slower than quicksort by a constant factor depending on the implementation. I consider the constant a minor point, others may consider "typical" versus "worst case" a minor point. When talking O() one should use worst case.

Here is what I do not understand by dreamchaser · 2007-02-27 03:06 · Score: 1

How can he put an algorithm under the GPL? How is that any better than someone patenting an algorithm? Is it OK because it's the GPL and this is Slashdot? I know that a license != a patent, but if one supports the idea that algorithms should not be protected IP then shouldn't we all be crying foul over that?

Not to mention that this doesn't appear to be any kind of new algorithm, just a 'rebranded' one.

Slashdot Mirror

Sort Linked Lists 10X Faster Than MergeSort

326 comments