Impressive Benchmarks: Sorting with a GPU
An anonymous reader writes "The Graphics research group at the University of North Carolina at Chapel Hill has posted some interesting benchmarks for a sorting implementation which is done entirely on a GPU. There have been efforts on doing general purpose computation on GPUs before (previous Slashdot article). However, most of them had generally utilized the fragment processing pipeline of the GPUs which is slower then the default high speed rendering pipeline. Apparently, the above implementation is done using "simple texture mapping operations" and "cache efficient memory accesses" only. There also seems to an option to download the distribution for non-commercial use, though the requirements seem pretty hefty (a very decent nVidia graphics card and the latest nVidia drivers)."
~~~
So the GPU is an additional processor whose power can be used. Any card with an additional processor could do the job. Considering that, much like 3D, most algorithms/operations are often the same, much processing power would be gained with such cards and a standardized library alla opengl.
\u262D = \u5350
"Who the fuck has enough data to sort on a regular basis that they'd need hardware-assisted sorting?"
Database admins?
Who the fuck has enough data to sort on a regular basis that they'd need hardware-assisted sorting?
Perhaps you've heard of, I dunno.. Google, or Oracle?
This isn't exactly a fair test as I see it. As far as I can make out they've put a custom sorting algorithm up against the standard C library qsort. How about some comparisons of this GPUsort against other sorting algorithms run on the CPU?
I'd love to see Judy-style thinking applied to GPU problems..."
..
especially since i use Judy arrays for tons of things on two different architectures, and its just a darn efficient hash library for pretty much all of my needs
; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --
Sorts are very common in applications and _can_ be slow. In a future where everyone has a GPU this can effectivly serve as a dual processor setup. Just as the FPU helped us 10 years ago with floating point operations.
Great.. but. does it run linux?
the fragment processing pipeline of the GPUs which is slower then the default high speed rendering pipeline
For the past two generations or so (starting with the Radeon 9800), there has been no such thing as "the default high speed rendering pipeline". The only circuitry present in the chip has been for evaluating shaders, and the fixed-function pipeline has been implemented as "shaders" that the driver runs on the chip automatically.
At least, I know for a fact this is true of ATI chips, and would not be at all surprised if nVidia is doing something similar.
Apparantly, the above implementaion is done using "simple texture mapping operations" and "cache efficient memory accesses" only.
Fry: Magic. Got it.
First, congratulations to J. Ruby who pioneered this work and is mentioned throughout the article. I work with him, and on his desk are _already_ four machines with dual GeForce 7800 in SLI mode. Talk about lust factor.
Ruby's first published work was the SETI@HOME modified client which uses Nvidia (or) ATI GPU for the waveform FFT calculations. I have watched him steadily upgrade his Nvidia GPU up to this wicked 7800 arrangement he is using today.
Go Jim! You owe me a beer.
I probably don't know what I'm talking about, but I'm wondering....
What is the performance if the GPU is busy rendering when you play a game?
When the GPU is busy doing what it is supposed to do... a program should resort to qsort right?
(Smalltalk bit block transfer) He was always making assmptions about the underlying virtual engine.
He was always getting it wrong too. I have logged thousands of hours and thousands of miles, from Montreal, Canada to Lisbon, Portugal cleaning up after this yobbo. What a fuckup he was.
The opinion was shared too. It got to the point to where we could write code that would detect his code and, as soon as we came across it and confirmed it we would remove it and read the original spec to know what to code.
"Ghoul" was a geek's geek. He stayed married about a week to a co-worker's daughter. Sad in a funy sort'o way.
MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
However, most of them had generally utilized the fragment processing pipeline of the GPUs which is slower then the default high ...
What is so difficult about knowing when to use then and when to use than?
This kind of amazing stuff you can expect to see on Cell CPU. And you will not have to hack GPU, it is supported by Linux kernel.
839*929
I'd imagine that there are probably more entities than the OP realizes that would need this sort of sorting. I worked in a Fortune 100 retailer, in one of their 600 stores, which had 5000 individual transactions per day, with an average of 18-20 items per transaction. They routinely do data analysis on item movement, during what hours, bought in conjunction with what other items...blah, blah, blah.
You can do the math, but that's a lot of data at the corporate level.
...that the greatest threat to Intel's market domination in the future is not going to come from AMD but from a company such as Nvidia.
Take a look at what they are doing with their GPUs right now and you can understand why someone would suggest this.
- Toby
that, since graphic hardware is now completely programmable, ATI and Nvidia realese their specs, so that everyone can use the GPU as a specialized vectorial coprocessor. A CPU that cannot be programmed freely just makes no sense, the same should be for GPU.
Wondering why i am doing so strange posts? I am trying to get a "+5,Flamebait" or "-1,Insightful" rating.
The GPU has lots of separate processing elements that operate independently. It also supports floating-point vector operations. Quite a different animal from yer-average-x86.
This may be slightly offtopic insofar that it doesn't directly deal with the subject (sorting with a GPU) at hand, but I was wondering how these sorts of research projects overcome "floating point number weirdness" I've heard about doing GPU calculations (as per the implementation being non-IEEE)? Doubly, would someone in the know help explain what the aforementioned "weirdness" means?
I'm quite probably talking rubbish but... is hacking the GPU the best possible solution to the problem (i.e. fastest sort possible for the amount of money spent on the hardware)? Or is this just someone having some fun (which is fair enough)?
Someone please enlighten me!
Those who can make you believe absurdities can make you commit atrocities. - Voltaire
Most GPGPU (general purpose GPU) researchers are envisioning scientific purposes. It really galls many of us in the scientific community that GPUs are so much more powerful than CPUs (if one can efficiently use the parallel processing capabilities of GPUs), and yet mostly we have to let these powerful processors go unused because it is typically very difficult to use GPUs for non-graphical computations (hence the G in GPU, of course).
Ben Hocking
Need a professional organizer?
Are not most GPUs limited to 24 or 32 bit precision? The x86 processors can go to 80 bits.
a very decent nVidia graphics card
So is that like "average", except "very average"?
Probably not, however if a GPU is available and you cna do your sort operation on it, and the sort operation would take long enough to hog the main CPU + that it wont occupy busses etc to much, you might end up gaining from it i suppose.
(Given that the sort procedure is something that CAN be run in the background whilst the user is doing something else).
I highly doubt that companies like Oracle or Google would benefit from using GPUS in large scale instead of CPUS...
I've heard about doing GPU calculations (as per the implementation being non-IEEE)? Doubly, would someone in the know help explain what the aforementioned "weirdness" means?
There are just some functions missing, like different roundings and the missing double precision. GPUs are simply not optimized for scientific calculations, but that doesn't mean that they can't be programmed to be useful for those workloads.
Why I am really, really tempted to post this on an NVidia fansite with the caption
LOL NVidia suxxorz ATI ownz j00!
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
That is a sly page advertising http://www.lawyersandsettlements.com/ , ignore this asshole.
I think it's just fun.
Probably the reasons why GPU's are fast for the task they are designed for - a small amount of very fast (assuming you access in the right order) non paged memory and a very simple (no Out of order execution) but highly parallel processor make them bad at general purpose stuff.
On the other hand, I can imagine that you could build a sort coprocesseor in an FPGA - the fact that it was optimised for the algorithm might be able to outweigh the fact that FPGA implementations of a given piece of hardware will have a lower clock rate than customer silicon ones.
But no one seems to be doing it, and there's probably a good reason for that. Maybe in SSE5 or something. You have to wonder if the cost in chip real estate for a big chunk of programmable logic is going to bring more benefits than more cache, or ALUs and so on.
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
Anyway, the application does A LOT of sorting, I'd give an arm and a leg for a solution like this. If there was some way to ship all this sorting to the graphics card, I'd do it! So now there is, I'm more than wide awake :)
http://jcsnippets.atspace.com/ - a collection of Java & C# snippets
Seriously, like who here hasn't already done this themselves anyway. At least it's not a dupe I suppose !
While I'm sure it's an interesting and difficult programming challenge, it seems like the notion here is taking a specialized piece of hardware that's developed for one specific purpose (high-speed rendering of graphics) and using it for some other purpose (calculation)
While it's laudible that this has been tried and succeeded, is this anything more than the equivalent of someone porting linux to a calculator watch? Sure, it's hard, and impressive you were able to do it, but is it something that anyone will actually use?
If someone wants more sorting power on a box, is this really a viable/useful/most efficient way to get those additional cycles, over options like adding/upgrading existing processors, etc.?
It is true. The GPU floating point implementations are not IEEE compliant. There is an interesting study on the floating point behavoir of the GPU here (and the associated pdf)
Next time I need to hammer a bunch of nails in parallel, perhaps I'll consider using a bulldozer! :D
Your point is well made, of course. Nevertheless, with work like this (the GPU sorter) even off-loading a little work to the GPU can allow your CPU to do other work thereby shortening the wall-clock time required to do your computations (in theory - in practice, it can hurt you if you're not careful!). My research area (neural networks) is inherently parallelizable, but I am not yet aware of work to efficiently use GPUs for this purpose.
Ben Hocking
Need a professional organizer?
This is an implementation of the Bitonic sort.
;)
From the article, when comparing their sort to previous GPU sort: "These algorithms also implement Bitonic Sort on the GPU for 16/32-bit floats using the programmable pipeline of GPUs."
So as I understand it they made a very fast implementation (using the GPU) of an old algorithm suited to parallel processing: bitonic sort was published in 1968 (hey, where were the fast parallel processors in the late sixties
For general purpose sorting, the number of passes in radix sort is log n (e.g. the number of digits of a decimal number is ceil(log10(number) ), so it is also O(n log n). It's only an optimization in sparse sets.
Liberty you never use is liberty you lose.
Perhaps the time has come for separate GPU and video output cards. The video output card could be fairly dumb, perhaps on the motherboard, and the GPU card optional and usable even in servers without a video card.
So it's "hard coded" for a couple types. The standard qsort has you pass a function pointer to a comparison routine so it can sort anything. Standard qsort also sorts a list of pointers to the items - I bet this GPU sort works directly on the data. Implementing a less generic version on the CPU is likely to result in it being faster than the GPU sort.
The blurb at the end about increasing GPU speed with each generation is crap too. Both CPU and GPU performance are now limited by power dissipation issues.
It remains more effective to write stuff like this on the CPU rather than code for 2 different devices. Imposing the same limitations on generic C code will usually result in the same or better performance on the CPU. These "general purpose GPU" programs always seem to illustrate why the GPU is not general purpose.
Comparing with C qsort is strange, qsort will never work fast due to being unable to inline the comparison function. Hand-written qsort of floats should work much faster.
put Linux on it? man, i want to run Debian Sarge on it :D
Computer science research being done at a liberal arts university? The universe is going to implode, SCO will run IBM out of business, Google will start selling our life history to terrorists, and George Lucas will make a movie that doesn't suck!
Disclaimer: I'm a tarheel born and bred. And a student at Chapel Hill.
I don't care about any general purpose computing with GPU.
What I want is a CPGA card, that is a Color Porno-Graphic Adapter, which accelerates drawing spherical triangles (aka Bolyai-Lobatschevskiy geometry).
That way we can get decent-looking boobs and asses in the upcoming Lara Croft game, composed from a few dozen spherical triangles instead of thousands of flat ones. That will result in 1000+ fps at 1600x1200 resolution.
I am truly fed up with those homonculus-looking bitches in GTA San Andreas. They are as desirable as Frankeinstein or an ork.
I was working on a sorting algorithm based on curve fitting. It's not a comparison sort, but more like a radix sort, but not one.
You start with a bunch of numbers, or things that can be given a number. Loop through the numbers, keeping track of High, Low, total, mean, std dev, etc. Use that information to develop an interpolating curve, which tells you for a given value where it ought to end up.
Put the Low at the front and the High at the back. On the next pass throught the numbers, put the number you swapped with Low about where it goes. It replaces something; put that where it goes, and so on.
I'm kind of stuck about what to do with collisions. It's led me into different directions, such as whether to segment and recur with a linear interpolant or to use several thresholds and group the numbers, making a higher order polynomial.
I decided to waste time on Slashdot, instead. I'm such a loser.
sigs, as if you care.
I can't wait for someone to port an MP3 encoder to GPGPU. A $300 P3, stuffed with 5 $200 videocards, could host a pool of MP3 encoders, faster than 10 Xeons which would cost $30k. And a lot easier to administer in a streaming farm.
--
make install -not war
I do.
We don't know the memory requirements for the GPUSort (this should be provided when comparing sort algorithms). We don't know if there is a limit to the amount of keys it can sort (what if the data doesn't fit in the GPU memory). Does the algorithm keep the original order when the key is the same?
In the case we are interested in, with enough memory radix sort is O(n) in speed so it would beat qsort and GPUSort for a large enough set of data. If we had been talking about larger keys, we don't know how GPUSort would fair either.
Will it detect this "mythical girlfriend" if she's naked? Or was this an illusion drawn up by your fevered mind?
/. always rememeber that.
this is
- moomin
a couple of years ago. Nvidia gave him a FX5900 (I think. It was the one that sounded like a hair drier and got pulled from the market) to do his research with. Anyways, check out his papers on the subject.
I don't really see the point. I mean, yes, it's kind of neat, but that's about it. It serves no practical purpose, really.
Not every machine has a GPU. I don't know, but I suspect GPUs aren't terribly compatible with each other, so for any sort of market, you'd have to code for multiple GPU types.
The fact is, co-processors have been around since the early x86 CPUs. Not just math co-processors, but Intel used to cell i860 and i960 co-processor boards (some with multiple CPUs) for just this sort of thing.
I'd also suspect that if your GPU is being used for sorting or some other calculation intensive operation, it's less useful as a GPU. If you don't need the GPU as a GPU, but you need the computing power, I suspect spending the additional money on more CPU power instead, is going to have a bigger overall payoff since it's going to speed up every operation, not just the ones a GPU can be adapted to.
So, again, I don't really see the point. Really, if you need specialized co-processing, getting a FPGA board will probably be a much better use of your money since it can be customized for many more tasks than a GPU.
"requirements seem pretty hefty (a very decent nVidia graphics card and the latest nVidia drivers)."
But in the first graph, it clearly shows the less expensive ATI card beating nVidia. I also don't understand why the latest nVidia drivers are a hefty requirement.
Are GPUs as reliable as CPUs for these tasks? I mean if there's an error in a GPU you might get a brief rendering glitch and lots of people may not notice.
In fact, some people are actually talking about ways of testing and allowing slightly broken graphics hardware to be used in situations where the _visual_ results won't be that off.
You could run a large simulation as a neural net in distributed Smalltalk, taking advantage of the inherent parallelizability (if such a word exists?) of the actions of a neuron firing when inputs hit certain critical tresholds.
Basically, it the same structure I would use when doing a terrain simulation using finite state automata. The wider you can spread the computational net, the more computational fish you can catch.
Its not the object that is so complicated, (a neuron is, uh, stupidly simple [despite its exquisitely complex bio-chemical processes], and can be likened to a wet on-off swich,) its the Relationship and Connections to other objects that become extremely rich.
Luckily connections are existentatial in nature and either, to wax Shakespearian, __be__ (and can request the objects objects connected to acknowledge each other,) or __not be__ (in which case the only thing that can happen is that a connection be created between the objects.)
One of the objects can itself be a connection which means that Relationships are recursive.
MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
Quicksort exhibits n-squared behaviour when the data set is sorted but reversed. Most decent quicksort routines do a random shuffle of the data at the start to avoid this issue.
Choosing a sort for a particular situation is very much a matter of choosing. A shell sort is very fast for small data sets - it's quick to implement and follows C*O(n^1.27), where C is small compared to P or Q in your example. With mostly sorted data sets, the choice to sort algorithm is even more tuning sensitive - consider depth-sorting vertices in a rotating model - most of the data isn't moving around very fast and is exchanging a few places up and down the list. Sometimes the more trivial and naive sort works better than quicksort even for large mostly-sorted sets.
Cheers,
Toby Haynes
Anything I post is strictly my own thoughts and doesn't necessarily have anything to do with the opinions of IBM.
In pattern recognition circles, there's a very simple alogorithm known as the k-nearest neighbors classifier. First you start with a big database of information. Let's say you have a database of a million bank loan applicants in the past, and for each one of those applicants, you have several variables: say, income, age, amount of loan, length of loan, total household income, etc. You also have final information as to whether that loan applicant ended up defaulting on their loan. Now, you have a new applicant come in, and you want to predict, are they a safe loan? The k-nearest neighbor algorithm says: just compute the distance (you have a lot of freedom as to how you define distance) between this new applicant and the million people in your database. For example, you can just take the sum of squares difference between all the variables. Now, take those million distances you've computed, sort them from nearest to farthest, and take the k closest instances, and basically have them vote on whether or not this new person will default. If k=500, and the 453 nearest neighbors all defaulted, then you suspect the new person will also default.
Sorting a million entries is not hard, but doing it quickly gets a bit difficult. If you want to get even fancier, you can scale the units along each dimension before you compute the distance, to give more weight to certain parameters (feature weighting, or wrapper bias, in the pattern recognition parlance). To do that really really well, you have to do an optimization. Because this is fraught with local minima, this requires either a lot of restarts or stochastic algorithms. Either way, you end up evaluating, and doing the million-entry sort, many hundreds or thousands of times.
Now, there are really fast ways of sorting out the k-nearest neighbors (KD-trees, for instance), and there are ways of pruning those million training points down, but this is just an example of a problem that involves colossal sorting. This algorithm is considered basic in that its very easy to understand and implement, and often performs very well, regardless of the structure of the data. However, just because its "basic" doesn't mean that it isn't used very heavily in real-life data mining situations.
Hope that sheds some light on the subject! Anyone else have another giant sorting application to share?
Except if just want to turn off zbuffer, when you draw transperant objects, you want to draw them back to front, because the end result will be errorous if you don't.
Perhaps there are other tasks, but that's the most common I suppose, and the one effect I've been using while coding 3d..
Albert
This might be off topic, but about a month ago
(after i was sure Frame Buffer Objects were implemented in my windows drivers) I tried
FBOs in Linux and they did not seem to be there.
Have you actually used FBOs in Linux, if so I would be really interested to know with which cards?
As far as I know ATI still does not have them for windows (for Linux? ha), and OSX does not have support for them for any card. I am hoping something has changed since I last tried.
I currently have a cheap GF 6 series, which I bought pretty much just for using FBOs in Linux.
when PCs start having the "GPU" on a separate card from the graphics card. The next step would be to have a GPU socket on the motherboard, just like in the old days of FPU co-processors. Soon after that, the CPU makers will start incorporating the GPU into the CPU again (just like the FPU was integrated in the 486 generation and later). And we have another complete cycle of functionality being first taken out of the main CPU and done in specialised hardware, and then being integrated back into the CPU...
moderators are afraid of criticism.. instead of doing the right thing, they censor facts
In high flow operations like sorting, memory latency and bandwidth matters.
Woh.
What seems unfair to me is comparing 24bit accuracy with 32bit. Yea, you can do it faster, but you're sacrificing a lot of the usefulness of doing it!
Does Slashdot have editors?
"slower then..."
"Apparantly..."
This is an epidemic. You may not think it's important, but it is. Please try to use whichever language you choose properly. That way your mistakes don't distract from the message, and you sound like you're actually smart. Notice how I used both "your" and "you're" correctly there? Amazing!
And editors, please actually EDIT submissions.
GPUsorting by Callele Neufeld & DeLathouwer at U of Saskatchewan - paper, videos & 2003 PPT.
suddenly paying $250 for a video card doesnt seem like such a bad deal...
Looks like we have a new videocard benchmark too.
my karma will be here long after I'm gone
Back in the day -- 1975 -- my first hack in college was to write a program to sort a collection of data by creating a file which had "line numbers" created in the same numerical as the data to be sorted. I then told the HP2000C Basic that this was a "program" and it dutifully sorted the "program" by its "line numbers."
This was at least 10 times faster than a Quick or Heapsort in Basic.
An unfortunate side effect was that the entire time sharing system went down to focus on completing this privileged task before swapping out for anyone else. Heh.
Evergreen State College System Admin Chas. Douglass was not happy with this sorting algorithm... but then we discovered what happened when we said Mat X[20000]=0...
A shorter answer to your question would be "yes" :)
http://jcsnippets.atspace.com/ - a collection of Java & C# snippets
I do know and understand the difference between its and it's, but your "rule of thumb" is retarded in this case. For all other nouns (not pronouns), an apostrophe-s signifies possession. It's the only remaining case-ending (well, two if you count plural), which is the genitive case. Latin and Greek nerds correct me if I'm mistaken.
So I think it's perfectly reasonable for someone to assume that "it's" is both a contraction for "it is" and a possessive of "it" at the same time. "His" is a more obvious exceptions to the apostrophe-s ending, although I can imagine someone incorrectly writing "her's" instead of "hers." If I was talking about Sally's cat, falsely expanding "Sally's" into "Sally is" would just cause confusion. Bottom line, if you meant to write a possessive using the standard English apostrophe-s, then there is no contraction to "explode" as you put it. The same would apply to "your's."
My personal pet peeve (although mistakes with you're, they're etc. are indeed annoying) is when people use an apostrophe for plurals! I once saw a sign in a dinner advertising a special on "ham and egg's." Blech.
Si la vida me da palo, yo la voy a soportar Si la vida me da palo, yo la voy a espabilar
Implementing a similar "tuple sort" on a CPU with the same restrictions would be much faster than generic qsort. They claim the Intel C++ qsort is optimized using hyper threading and SSE instructions (they don't specify so I assume it's just a compiler setting), but I don't see that offering the same advantages of restricting it to a "tuple sort".
Theirs is more useful than I first thought, but I still think the CPU should be able to match it by imposing similar restrictions.
BTW, I'm also a strong believer in heapsort vs quicksort. Most comparisons I've seen show quicksort about 1.5x to 2x faster, but they also omit some optimizations of heapsort that I use. Heapsort is guaranteed to be O(n*log(n)) where quicksort is probably that fast, but degrades to O(n^2) if you're unlucky.
I made a quick benchmark with the c++ STL-sort by sorting an array of 16 million 32-bit floats.
With a fairly old AthlonXP 2000+ the sort-command took around 4.5 seconds to run, which seems to be _almost twice as fast_ as the NVIDIA 6800Ultra in the linked benchmark, so in this light the sorting-performance of the GPUs is indeed not yet too impressive.
If the testers were using the fastest known software for the GPUs, it would probably also have been reasonable not to use terribly clumsy sorting implementations with the CPUs either.
and I thought DOOM 3 had serious graphics card requirements! Now Oracle database servers will require NVIDIA graphics cards?!
cpeterso
Perhpas in ten years, we'll get to hear a kid say, "GPU? Oh that stands for General Processing Unit, right?" eh? Perhaps not.
Furry cows moo and decompress.
So, does this work with PCI Express cards?
Or is there a dual-xeon board with an AGP slot?
Supermicro isn't offering one.
I think this little hack (if it's stable enough for production use) would be a nice addition to "anything database driven".
Most RDBs (well, any kind of DB I suppose) spend a significant amount
of time on sorting stuff. Moving the task to the GPU should result in a serious performance boost (offloading the CPU *and* sorting faster sounds like a pretty good deal to me)
Well, just a pipe dream?
(I'm finding it so hard to abstain from commenting on the wisdom of replacing marvelously stable and transparent file systems with databases... I guess I'll try this time.)
As database operations become more commonplace, even the average user will "benefit" from this. I place it in quotes because it was a problem that didn't need solving prior to the "use databases to manage your buddy list" mentality.
There are probably other, saner uses for it though, although people will have to weigh this benefit against the cost of creating another dependency on their system for a specific library tied to specific hardware, and the concerns of what happens when some of the parts are upgraded but not the others....
An email I just sent to the authors:
:)
Hi,
I've seen your page and the benchmarks are very interesting
But I have some doubts about the quality of the CPU benchmark - qsort has
some performance problems, since it involves calling a function indirectly
through a pointer for every comparison made, and it supports variable size
elements, etc... Besides, for sorting integers maybe sorting methods not
based on comparisons would be better, this would have to be tested though.
Did you try other sorting methods or a custom quicksort routine?
Regards,
Ricardo Barreira
The AACS key is NOT 0xF606EEFD628B1CA427BEA93A9CA9773F
I just had a bad Data Structures flashback from reading these posts. Next people are going to strat posting induction proofs.
well, radix sort can not be used for general purpose sorting. that is the drawback, the benefit is that it IS fast.
As for sorting integers or floats, radix sort IS O(n).
rs also has nice properties that it is best-case/average-case/worst-case O(n).
qsort is only O(n log n) for average case IFF all the input data to sort is completely random or carefully selected to perform well.
try qsort() and compare it with non-random real-world data and see how they differ.
You have shown your ability to read (other posts), rewrite the content and the need for more attention.
Now go somewhere else and jack off or whatever.
The previous comment [Mod parent up!] has got some funky HTML attributes [rel="nofollow"] and a weird local URL: Has someone discovered [and exploited] a bug in Slashcode?
Sorry - I was had.