Open Source Solution Breaks World Sorting Records
allenw writes "In a recent blog post, Yahoo's grid computing team announced that Apache Hadoop was used to break the current world sorting records in the annual GraySort contest. It topped the 'Gray' and 'Minute' sorts in the general purpose (Daytona) category. They sorted 1TB in 62 seconds, and 1PB in 16.25 hours. Apache Hadoop is the only open source software to ever win the competition. It also won the Terasort competition last year."
I for one welcome our new datasorting overlords!
The Long Now Foundation
My sort will totally beat yours!
$ make available
So, it appears they have finally sorted out whether open source beats proprietary.
When information is power, privacy is freedom.
If it's winning competitions at 0.20, when will they release it?
...this cluster had nearly 4 times the number of nodes as the previous records. This competition was testing who had more nodes working together the best, but when you have so many more nodes, it would be hard not to top other clusters.
OK, so where are the "Java is slow" comments? o.O
Probably why the second sentence in the article is "All of the sort benchmarks measure the time to sort different numbers of 100 byte records. The first 10 bytes of each record is the key and the rest is the value."
Belief is the currency of delusion.
I've asked lots of interview candidates to implement randomSort. They've never heard of it, so then I describe the algorithm.
Watching their eyes go wide is the highlight of the interview, typically.
Occasionally some person who has overcome their interview nervousness will, with eager honesty, try to implore to me that this is not a very good sort algorithm, and that much better ones are taught in universities these days.
Good Times.
My opinions are my own, and do not necessarily represent those of my employer.
Also, you can't patent software in Europe
Not yet, but they are working on it. They tried to snuck it through by hiding it in the amendments of an agricultural bill. Luckily Poland kept watch and rose a stink about it.
It's not over. There is too much money to be gained for that.
This doesn't say anything if we don't know what kind of records were supposed to be sorted.
It's amazing what you can learn if you actually RTFA.
All of the sort benchmarks measure the time to sort different numbers of 100 byte records.
If that's not good enough for you, post your email address and maybe someone will be kind enough to send you the 100TB and 1PB data files they used.
Dual Opteron < $600
Google's sorting results from last yeat (link) are much faster; they did a petabyte in 362 minutes, or 2.8 TB/sec. They minute sort didn't exist last year, but Google did 1TB in 68 seconds last year, so I think it may be safe to assume that they could do 1 TB in under a minute this year. Google just hasn't submitted any of their runs to the competition.
From the sort benchmark page, the list the winning run as Yahoo's 100TB run, leaving out the 1PB run; that implies the 1PB run didn't conform to the rules, or was late, or something.
People have commented that this is a "who has the biggest cluster" competition; the sort benchmark also includes the 'penny' sort, which is how much can you sort for 1 penny of computer time (assuming your machine lasts 3 years), and 'Joule' sort, how much energy does it take you to sort a set amount of data. Not surprisingly, the big clusters appear to be neither cost efficient nor energy efficient.
Here in the UK, the patent office has been issuing software patents for some time in "anticipation" of them becoming legal at some point in the future.
No, I don't understand that either.
Why isn't this illegal - adding unrelated legislation to a ? Is there anywhere in the world why this practice is not permitted, or better yet, prosecuted?
The GP is confusing a bunch of things. First, the Council of Ministers threw out all limiting amendments from the European Parliament and reached an Political Agreement on a shoddy text through backdoor maneuvering by Germany and the European Commission. That text would have turned the European Patent Office's practice of granting software patents into EU legislation.
A Political Agreement has no juridical nor legislative value, but it has never happened that a political agreement was later on annulled and that negotiations were reopened. So also in this case, even though the German, Dutch, Spanish and Danish parliaments afterwards passed motions asking to reopen the discussions, the Council's bureaucrats did not want to do that because it "would undermine the efficiency of the decision making process".
Anyway, once you have a Political Agreement (which is reached by the representatives of the ministries responsible for the matter at hand) and nobody "wants" to discuss it anymore, the agreement can be placed as an "A item" on any EU Council of Ministers meeting, since it only needs rubber stamping in that case. In the case of the Software Patents Directive, it appeared several times as an A item on the agenda of an Agriculture and Fisheries meeting (which is presumably where the GP's confusion stems from).
In principle, there would have been nothing wrong with that, but in this case there was no actual political agreement, and in particular Poland was very unhappy with the way it had been treated. So 4 times in a row, Poland either had this "A item" removed from the agenda (sometimes at the last minute, because the responsible Polish minister had to be informed that they were again trying to get it through at a meeting he had no business with), or turned it into a "B item", which means that it can't be rubber stamped but that they first have to talk a bit about it (which nobody wanted to do).
In the end it still did get approved, but that whole circus helped with in convincing the EU Parliament to table a resolution asking the Commission to restart the directive's process, and when the Commission refused to later on squarely reject it.
You can find some more of my thoughts on the Council's behaviour here.
Donate free food here
Bogosort: for when you have you are paid by the hour, but aren't penalised for being late.
with my luck, bogosort would get it right the first time.
No, he clearly changed roles from developer to Evil HR. He's probably directly subservient to Catbert.
Hadoop's name (and mascot) came from Doug [the project leader] Cutting's son's yellow stuffed elephant toy.
"Why isn't this illegal"
Because they made it legal by passing it on a Totally Unrelated Bill.