Microsoft Research Introduces Record-Beating MinuteSort Tech
mikejuk writes "A team from Microsoft Research has taken the lead in the MinuteSort data sorting test using a specially-devised technology: Flat DataCenter Storage. The figures are impressive — 1401 gigabytes in 60 seconds, using 1033 disks across 250 machines. Not only is this three times as much as the previous record, but also, it uses only one sixth of the hardware resources, according to a blog post about the test from Microsoft. One thing that's interesting about the success is the technology used. While solutions such as Hadoop and MapReduce are traditionally used for working with large data sets, Microsoft Research created its own technology called the 'Flat Datacenter Storage,' or FDS for short. This isn't just academic research, of course. The team from Microsoft Research has already been working with the Bing team to help Bing accelerate its search results, and there are plans to use it in other Microsoft technologies."
Their support for research and innovation is top-notch. They are pretty much the only one of the large companies that fund this kind of research and they fund it with billions. Their work does lots of good for the world. Good job guys.
Sorted by Microsoft
...yet MinuteSort still takes a minute!
My God can beat up your God. Just kidding...don't take offense. I know there's no God.
Not only is this three times as much as the previous record, but also, it uses only one sixth of the hardware resources, according to a blog post about the test from Microsoft.
The important part is not that this is a new approach, but that they beat the previous record using less hardware.
The team from Microsoft Research has already been working with the Bing team to help Bing accelerate its search results, and there are plans to use it in other Microsoft technologies.
So Bing is going to scrape their search results from Google *and* other search engines? :-)
It must have been something you assimilated. . . .
Did they actually do anything or just build a machine using todays hardware and lots of funding. A team from yahoo got the record in 2009 hardware has changed alot in the 3 years and when money is not a object couldnt anyone do about the same?
It only works using IE6.
More irrational Microsoft hatred from the peanut gallery. Interesting accomplishment from Microsoft Research (a group which has produced all kinds of useful advances in computing and software development, and which has very little to do with shipped products like Outlook, IE6, etc.); Average /. luser interpretation? LOL SHILL ARTICLE FROM TEH MICRO$OFT FAGGORTZ YOU SUCK LOL.
Good to see that a nerd site is inundated with droves of empty-headed group-think religious fanatics!
When you're done masturbating to your imaginary universe, maybe you'd like to sit down with the likes of Simon Peyton-Jones and discuss some of the finer points of the terrible work he and his peers have been doing.
Baa-hahahaha. Right.
"they developed a different way of referencing the massive amount" Different doesnt mean much I could do somthing different doesnt mean it is any better or any worse. The say less hardware and sure maybe quantity wise but not by much microsoft - 1033 disks yahoo - 1406 difference 373 - that alone I would say would just be from advancements made in hard drives yahoo nodes 2x quad core xeons 8GB of assuming ddr2 ram which was current at the time 1gb ethernet port on each node 40 nodes per rack microsoft 2 - 12 cores a cluster 24GB - 96GB assuming ddr3 ram 10gb ethernet ports 78% were 10,000 rpm disks crazy interconnects so the hardware quantity may be lower but again being that their hardware is so much more advanced I dont see that they actually did anything.
Sorry ignore the last posting it lost formatting and I cant edit it
"they developed a different way of referencing the massive amount"
Different doesnt mean much I could do somthing different doesnt mean it is any better or any worse.
They say less hardware and sure maybe quantity wise but not by much
microsoft - 1033 disks
yahoo - 1406
difference 373 - that alone I would say would just be from advancements made in hard drives
yahoo nodes
2x quad core xeons 8GB of assuming ddr2 ram which was current at the time
1gb ethernet port on each node
40 nodes per rack
microsoft
2 - 12 cores a cluster
24GB - 96GB assuming ddr3 ram
10gb ethernet ports
78% were 10,000 rpm disks
crazy interconnects
so the hardware quantity may be lower but again being that their hardware is so much more advanced I dont see that they actually did anything.
Thats what im saying I actually think what yahoo did was amazing given the hardware they used this is just a disgusting purchase to the top. Im not a fan of microsoft or yahoo I just dont like it when people are all microsoft is the greatest when all they did was open their wallet.
You know whats also convenient out of the different tests they beat the only one that was held by yahoo. I try not to think this way but it makes me wonder.
Website: http://sortbenchmark.org/
PDF: MinuteSort with Flat Datacenter Storage
The sorts were accomplished using a heterogeneous
cluster consisting of 256 computers and 1,033 disks, di-
vided broadly into two classes: storage nodes and com-
pute nodes. Notably, no compute node in our system
uses local storage for data; we believe FDS is the first
system with competitive sort performance that uses re-
mote storage. Because files are all remote, our 1,470 GB
runs actually transmitted 4.4 TB over the network in un-
der a minute. No strong assumptions are made around
key or record lengths; keys and records of other lengths
can be handled with only a performance-neutral config-
uration change.
Summary
FDS is a general-purpose scalable parallel blob store
that exploits a full-bandwidth interconnect to expose the
entire cluster’s disk bandwidth to remote clients. The
sort performance results in this paper demonstrate the
power of the architecture: in both Daytona and Indy
sorts, the system reads the data remotely to the sort ma-
chines, sorts the data across the network, and writes it
remotely back to storage.
Performant remote file access imparts a flexibility ab-
sent in contemporary distributed storage systems. Be-
yond sort, FDS supports a broad variety of scalable large-
data applications. It does so without demanding that
cluster nodes balance compute and disk performance;
more importantly, it does so without demanding that ap-
plications observe locality constraints.
tomorrow who's gonna fuss
Could someone knowledgeable comment on their "tract locator table" (or TLT) metadata system and it's possible relation to P2P protocols? If Bittorrent didn't focus on peer-speed as measured by reads and writes, couldn't it gain an advantage using this? TLT is expected to have consistent membership, but if it was updated once a minute (say), wouldn't that be enough to get the advantages without it taking to long to join a group?
tomorrow who's gonna fuss
I think it's unfair to say that they are the only company funding this sort of research. Plenty of research is done by other companies such as Intel, IBM and Google. Granted, since (as usual) it seems the real issue being debated here is whether Microsoft is evil or not. I'd have to say that the answer is a resounding No. I applaud this accomplishment. I still despise their products and general philosophy, but credit should be given where credit is due, and this deserves credit. I think this development sounds really cool and I hope that their research department continues to delve into interesting issues like this. Time alone will tell what will come from this. Whether I like the company or not.
~theCzar
It's rare that I seem to hear much about Microsoft does in the basic research areas.
They used 10 GigE with a very advanced set of switches that support OpenFlow so that they could get the full bisectional bandwidth. They could have use InfiniBand and probably done much better with FDR adapters capable of 56 gigabit per second. Even "old" IB adapters were faster. Most of the IB switches supported full bisectional bandwidth right out of the box. MS should look at the High Performance Computing world. They need to do handle large amounts of data with low latency.
-- soldack
MSFT Research has been a leader there for a decade. the technical programs was just announced Tuesday.