Google Sorts 1 Petabyte In 6 Hours

← Back to Stories (view on slashdot.org)

Google Sorts 1 Petabyte In 6 Hours

Posted by Soulskill on Sunday November 23, 2008 @04:53AM from the sort-of-fast dept.

krewemaynard writes "Google has announced that they were able to sort one petabyte of data in 6 hours and 2 minutes across 4,000 computers. According to the Google Blog, '... to put this amount in perspective, it is 12 times the amount of archived web data in the US Library of Congress as of May 2008. In comparison, consider that the aggregate size of data processed by all instances of MapReduce at Google was on average 20PB per day in January 2008.' The technology making this possible is MapReduce 'a programming model and an associated implementation for processing and generating large data sets.' We discussed it a few months ago. Google has also posted a video from their Technology RoundTable discussing MapReduce."

11 of 166 comments (clear)

Kudos to Google by Anonymous Coward · 2008-11-23 04:54 · Score: 5, Funny

for knowing how important the Library of Congress metric is to us nerds!
1. Re:Kudos to Google by canuck57 · 2008-11-23 06:30 · Score: 5, Funny
  
  for knowing how important the Library of Congress metric is to us nerds!
  But at least now we know Google can sort out petafiles.
Re:That's Easy by Blakey+Rat · 2008-11-23 05:08 · Score: 5, Insightful

I came here to post the same thing. If they sorted a petabyte of Floats, that might be pretty impressive. But if they're sorting 5-terabyte video files, their software really sucks.
Not enough info to judge the importance of this.

--
Comment of the year
Re:That's Easy by farker+haiku · 2008-11-23 05:16 · Score: 5, Informative

I think this is the data set. I could be wrong though. The article (yeah yeah) says that
In our sorting experiments we have followed the rules of a standard terabyte (TB) sort benchmark.
Which lead me to this page that describes the data (and it's available for download).

--
Your sig(k) has been stolen. There is a puff of smoke!
Re:That's Easy by Anonymous Coward · 2008-11-23 05:16 · Score: 5, Informative

From TFA: they sorted "10 trillion 100-byte records"
Finally... by aztektum · 2008-11-23 05:17 · Score: 5, Funny

I will be able to catalog my pr0n in my lifetime:
Blondes, Brunettes, Red heads, Beastial^H^H^H^H^H "Other"

--
:: aztek ::
No sig for you!!
Re:Sort? Sort what? by nedlohs · 2008-11-23 05:34 · Score: 5, Informative

I realize, slashdot..., but maybe you could glance at the article which states:
10 trillion 100-byte records
Re:tagging by gardyloo · 2008-11-23 05:53 · Score: 5, Funny

pr0n for Geeks, volume 18: Sorting On-the-Fly
Re:Sort? Sort what? by Dpaladin · 2008-11-23 06:12 · Score: 5, Funny

Sorting a petabyte sounds pretty impressive, but I don't think it was a whole yotta work.

--
Bad puns gave me bad karma. =(
Amazing feat... by Duncan3 · 2008-11-23 08:32 · Score: 5, Funny

Today from Google, the god of all things and doer of all things good in the universe, many millions of dollars in computer equipment were able to sort lots of things, in about the amount of time you would think it would take for millions of dollars of equipment to sort things.
In other news, a woodchuck was found chucking wood as fast as a woodchuck could chuck wood.
Congrats Google, you have a HUGE data set, and an even bigger wallet.

--
- Adam L. Beberg - The Cosm Project - http://www.mithral.com/
Re:One ups Yahoo & Hadoop by jollyplex · 2008-11-23 11:49 · Score: 5, Interesting

Exactly. It's unclear if their better time was a software engineering or algorithmic feat, though. Hadoop was able to finish sorting the 1 TB benchmark dataset in 209 s; TFA states Google pulled the same event off in 68 s. The Yahoo blog post you linked to says their compute nodes each sported 4 SATA HDDs. Note TFA mentions Google's 1 PB dataset sort used 48,000 HDDs split between 4,000 machines, or 12 HDDs to a machine. If Google used the same machines to perform their 1 TB sort, then they had 3 times as many HDDs on each compute node, and could probably pull data from storage 3 times as fast. 209 s / 68 s ~ 3.1 -- coincidence, or not? =)