Slashdot Mirror


Data Sorting World Record — 1 Terabyte, 1 Minute

An anonymous reader writes "Computer scientists from the University of California, San Diego have broken the 'terabyte barrier' — and a world record — when they sorted more than a trillion bytes of data in 60 seconds. During this 2010 'Sort Benchmark' competition, a sort of 'World Cup of data sorting,' the UCSD team also tied a world record for fastest data sorting rate, sifting through one trillion data records in 172 minutes — and did so using just a quarter of the computing resources of the other record holder."

4 of 129 comments (clear)

  1. I used to think it was great by MichaelSmith · · Score: 4, Interesting

    I had a 6502 system with BASIC in ROM and a machine code monitor. The idea is to copy a page (256 bytes) from the BASIC ROM to the video card address space. This puts random characters into one quarter of the screen. Then bubble sort the 256 bytes. It took about one second.

    For extra difficulty do it again with the full 1K of video. Thats harder with the 6502 because you have to use vectors in RAM for the addresses. So reads and writes are a two step operation, as is incrementing the address. You have to test for carry. But the result was spectacular.

  2. Only 52 nodes by Gr8Apes · · Score: 4, Interesting

    You've got to be kidding me. Each node was only 2 quad core processors, with 16 500GB drives (big potential disk IO per node) but this system doesn't even begin to scratch the very bottom of the top 500 list.

    I just can't image that if even the bottom rung of the top 500 was even slightly interested in this record, that they wouldn't blow this team out of the water.

    --
    The cesspool just got a check and balance.
  3. Important Details.... by pdxp · · Score: 5, Interesting
    ... to make it clear that you won't be doing it (TritonSort, thanks for leaving that out kdawson) on your desktop at home:
    • 10Gbps Networking
    • 52 servers x 8 cores each = 416 CPUs
    • 24 GB RAM per server = 1,248 GB
    • ext4 filesystem on each, presumably with hardware raid

    I think this is cool, but.... how fast is it in a more practical situation?

    source

  4. Pretty close to theoretical max by melted · · Score: 3, Interesting

    Let's consider 100TB in 172 minute thing they also did. 52 nodes, 16 spindles per node is 832 spindles total and 120GB of data per spindle. 120GB of data can be read in 20 minutes and transfered in another 15 to the target spindles (assuming uniform distribution of keys). You can then break it down into 2GB chunks locally (again by key) as you reduce. Then you spend another hour and a half reading individual chunks, sorting them in memory, concatenating and writing.

    Of course this only works well if the keys are uniformly distributed (which they often are) and if data is already on the spindles (which it often isn't).