Slashdot Mirror


Open Source Solution Breaks World Sorting Records

allenw writes "In a recent blog post, Yahoo's grid computing team announced that Apache Hadoop was used to break the current world sorting records in the annual GraySort contest. It topped the 'Gray' and 'Minute' sorts in the general purpose (Daytona) category. They sorted 1TB in 62 seconds, and 1PB in 16.25 hours. Apache Hadoop is the only open source software to ever win the competition. It also won the Terasort competition last year."

4 of 139 comments (clear)

  1. 100 bytes, 10 byte keys. by eddy · · Score: 5, Informative

    Probably why the second sentence in the article is "All of the sort benchmarks measure the time to sort different numbers of 100 byte records. The first 10 bytes of each record is the key and the rest is the value."

    --
    Belief is the currency of delusion.
  2. Re:When's it going to be 1.0? by Anonymous Coward · · Score: 5, Informative

    It's 0.20 but it's stable and production ready already. I use it with HBase and it scales awesomely.

  3. Not quite as impressive as it sounds by Sangui5 · · Score: 4, Informative

    Google's sorting results from last yeat (link) are much faster; they did a petabyte in 362 minutes, or 2.8 TB/sec. They minute sort didn't exist last year, but Google did 1TB in 68 seconds last year, so I think it may be safe to assume that they could do 1 TB in under a minute this year. Google just hasn't submitted any of their runs to the competition.

    From the sort benchmark page, the list the winning run as Yahoo's 100TB run, leaving out the 1PB run; that implies the 1PB run didn't conform to the rules, or was late, or something.

    People have commented that this is a "who has the biggest cluster" competition; the sort benchmark also includes the 'penny' sort, which is how much can you sort for 1 penny of computer time (assuming your machine lasts 3 years), and 'Joule' sort, how much energy does it take you to sort a set amount of data. Not surprisingly, the big clusters appear to be neither cost efficient nor energy efficient.

  4. Re:Overlords - Trivia by e9th · · Score: 5, Informative

    Hadoop's name (and mascot) came from Doug [the project leader] Cutting's son's yellow stuffed elephant toy.