Sorting Algorithm Breaks Giga-Sort Barrier, With GPUs

← Back to Stories (view on slashdot.org)

Sorting Algorithm Breaks Giga-Sort Barrier, With GPUs

Posted by timothy on Sunday August 29, 2010 @02:22PM from the quick-like-double-time dept.

An anonymous reader writes "Researchers at the University of Virginia have recently open sourced an algorithm capable of sorting at a rate of one billion (integer) keys per second using a GPU. Although GPUs are often assumed to be poorly suited for algorithms like sorting, their results are several times faster than the best known CPU-based sorting implementations."

16 of 187 comments (clear)

Min score:

Reason:

Sort:

Excel Charts by Anonymous Coward · 2010-08-29 14:30 · Score: 3, Insightful

I find it very disappointing that a group of programmer/computer science types who even supply BibTeX to make it easier to reference their work, resort to screen-capturing an Excel chart to display their data.
1. Re:Excel Charts by Anonymous Coward · 2010-08-29 15:00 · Score: 1, Insightful
  
  Who cares what tool they use to display data, if they're getting their point across in an effective manner? Good lord...
2. Re:Excel Charts by Anpheus · 2010-08-29 15:21 · Score: 3, Insightful
  
  Maybe excel was just the right tool for the job? It's quick and easy to use, and to reformat the graphs.
  I know the Linux tools tend to be a little longer between tweaking, rendering and displaying, so a fast WYSIWIG tool works just fine.
3. Re:Excel Charts by pspahn · 2010-08-29 18:29 · Score: 4, Insightful
  
  Actually, I find even more disappointing that a decent way to display datasets on a web page isn't standard yet. Why can't a nice one be embeddable with column sorts and robust methods for retrieving data? There are solutions, sure, but I have yet to find one that isn't unnecessarily complex or just plain ugly and difficult to use. But I guess it's just a matter of time, right?
  
  --
  Someone flopped a steamer in the gene pool.
4. Re:Excel Charts by dominious · 2010-08-29 22:54 · Score: 2, Insightful
  
  what. the. fuck. +4 Insightful because they use Excel charts?
  
  Hey I just solved N=NP.
  Yeah, but you are using Excel charts...hmmm sorry kthnx later.
5. Re:Excel Charts by multipartmixed · 2010-08-30 02:40 · Score: 2, Insightful
  
  1. Why do you believe those are screen captures, rather than, say, exported images?
  2. How would the data look different it were displayed with BibTeX?
  3. How fast is using BibTeX? (I've never used it). I could create those same charts in Excel '97 from a CSV of input points easily; probably in under a minute.
  
  --
  
  Do daemons dream of electric sleep()?
Not a barrier by Captain+Segfault · 2010-08-29 14:49 · Score: 5, Insightful

This isn't a "barrier" like the "sound barrier". There are no special difficulties that start around 1G/sec! It's just a threshold.
Don't get me wrong -- I'm not saying this isn't impressive, but no "barrier" was broken here!
1. Re:Not a barrier by XanC · 2010-08-29 14:51 · Score: 4, Insightful
  
  It's not a threshold! It's just a milestone.
2. Re:Not a barrier by caerwyn · 2010-08-29 15:20 · Score: 2, Insightful
  
  Actually, if you look at shockwave dynamics during the moment an object crosses from subsonic to supersonic velocity, it can very easily be considered much more of a barrier than 1gkeys/sec can.
  
  --
  The ringing of the division bell has begun... -PF
Um... by Anonymous Coward · 2010-08-29 14:50 · Score: 5, Insightful

Algorithms aren't measured in "x per second"... only implementations are measured that way. The speed of an algorithm is described in big-O notation, such as O(n log n). The metric of "sorted keys per second" is largely useless, because it depends on the particular hardware setup.
1. Re:Um... by Anonymous Coward · 2010-08-29 19:21 · Score: 1, Insightful
  
  The affect that extra CPUs will have is too dependent on the hardware implementation to be able to formalize like this.
  It's no more exotic than having several levels of cache between the CPU and RAM (or even treating RAM as a cache between CPU and network/spinning disk). Even O(N) algorithms curve when N overflows L1, L2, L3, and RAM.
Ugh. by martin-boundary · 2010-08-29 15:29 · Score: 3, Insightful

Stupid HTML ate my <...
The problem with big-oh notation is that the constant isn't explicit, so for any given n (pick as large as desired), it is possible that O(nlogn) < O(n) for some choice of constants. That's why ops per second is still a useful metric when comparing implementations on standardized hardware.
As always, in theory there's no difference between theory and practice, but in practice there is...
most real life sorting involves indirection by Anonymous Coward · 2010-08-29 16:04 · Score: 2, Insightful

The typical sorting problem I've encountered in my career (various types of scientific, telecommunications and business software, though not games) involves an array of pointers to fixed length records that have a sort key (let's say an integer) at a predefined offset. Not an array of integers, nor an array of directly embedded small fixed length records which I'm guessing was used in TFA. The former situation requires random as well as stream access to memory, which would likely favor processing by the CPU in the motherboard of a typical $1000-$2000 PC.
Re:No by black3d · 2010-08-29 16:52 · Score: 3, Insightful

Unfortunately I can't mod having already posted in this thread, but please allow me to /bow. This is the best explanation I've ever read anywhere for the differences. Even I knew the differences but couldn't have expressed it so finely. Bravo.

--
"The true measure of a person is how they act when they know they won't get caught." - DSRilk
Re:Big deal. Radix sort works well IF ... by Trepidity · 2010-08-29 17:28 · Score: 4, Insightful

Well, yeah, they're not claiming they invented radix sort. They're claiming that their GPU implementation of radix sort runs about 4x as fast as the CPU implementation you describe.

--
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Re:Big deal. Radix sort works well IF ... by evilWurst · 2010-08-29 19:19 · Score: 2, Insightful

It's generally not size of RAM that breaks radix sort; it's the size of cache. Modern processors are highly reliant on cache, which means they're highly reliant on things in memory being in small tight chunks that fit in cache - because cache misses are expensive enough that if you thrash cache badly enough, you may end up running slower than if you hadn't had any cache at all.
Good comparison sorts may start fragmented, but by their very nature each pass of the algorithm makes them less so. Radix sort is the other way around; it follows pointers (so more precious scarce cache in use already) that point in more and more fragmented patterns with every pass. That's why even though radix sort's average speed is theoretically faster than quicksort, quicksort still wins on real life hardware. And that's probably why radix sort wins on GPUs - the data fits in the card's dedicated memory, which is already optimized to be accessed in a much more parallel way than main memory.