Google Sorts 1 Petabyte In 6 Hours
krewemaynard writes "Google has announced that they were able to sort one petabyte of data in 6 hours and 2 minutes across 4,000 computers. According to the Google Blog, '... to put this amount in perspective, it is 12 times the amount of archived web data in the US Library of Congress as of May 2008. In comparison, consider that the aggregate size of data processed by all instances of MapReduce at Google was on average 20PB per day in January 2008.' The technology making this possible is MapReduce 'a programming model and an associated implementation for processing and generating large data sets.' We discussed it a few months ago. Google has also posted a video from their Technology RoundTable discussing MapReduce."
You do have to merge them all back together at the end...
Technically speaking, that's not true. In fact, you wouldn't want too.
Assuming some sort of search paradigm, you'd keep the records on their 4000 separate servers, each server doing its on search functionality, and *only* merge the results of the searches as needed and cache them in the web layer.
My computer runs Vista just fin.... Wait a second. I see what you did there! You are making a joke! That is hilarious! I have never heard this joke before. How witty and original! I got one for you... watch this.... M$. See that! I put a $ instead of an S! You saw it here first!
Similes are like metaphors
Hell, you can only read a terabyte hard disk a few times before you encounter unrecoverable errors.
Umm, what?
Dewey, what part of this looks like authorities should be involved?