Slashdot Mirror


Google Sorts 1 Petabyte In 6 Hours

krewemaynard writes "Google has announced that they were able to sort one petabyte of data in 6 hours and 2 minutes across 4,000 computers. According to the Google Blog, '... to put this amount in perspective, it is 12 times the amount of archived web data in the US Library of Congress as of May 2008. In comparison, consider that the aggregate size of data processed by all instances of MapReduce at Google was on average 20PB per day in January 2008.' The technology making this possible is MapReduce 'a programming model and an associated implementation for processing and generating large data sets.' We discussed it a few months ago. Google has also posted a video from their Technology RoundTable discussing MapReduce."

2 of 166 comments (clear)

  1. Re:That's Easy by Blakey+Rat · · Score: 5, Insightful

    I came here to post the same thing. If they sorted a petabyte of Floats, that might be pretty impressive. But if they're sorting 5-terabyte video files, their software really sucks.

    Not enough info to judge the importance of this.

  2. Need to benchmark against the best sorts by Animats · · Score: 4, Insightful

    Sorts have been parallelized and distributed for decades. It would be interesting to benchmark Google's approach against SyncSort. SyncSort is parallel and distributed, and has been heavily optimized for exactly such jobs. Using map/reduce will work, but there are better approaches to sorting.