MIT May Have Just Solved All Your Data Center Network Lag Issues

← Back to Stories (view on slashdot.org)

MIT May Have Just Solved All Your Data Center Network Lag Issues

Posted by Unknown on Thursday July 17, 2014 @09:53AM from the hierarchy-beats-anarchy dept.

alphadogg (971356) writes A group of MIT researchers say they've invented a new technology that should all but eliminate queue length in data center networking. The technology will be fully described in a paper presented at the annual conference of the ACM Special Interest Group on Data Communication. According to MIT, the paper will detail a system — dubbed Fastpass — that uses a centralized arbiter to analyze network traffic holistically and make routing decisions based on that analysis, in contrast to the more decentralized protocols common today. Experimentation done in Facebook data centers shows that a Fastpass arbiter with just eight cores can be used to manage a network transmitting 2.2 terabits of data per second, according to the researchers.

6 of 83 comments (clear)

Min score:

Reason:

Sort:

Re:Yawn by Anonymous Coward · 2014-07-17 10:22 · Score: 3, Informative

A link to the paper is in the first article link. Direct link Here. They also have a GIT repo to clone, if you're interested.
Re:rfc1925.11 proves true, yet again by Archangel+Michael · 2014-07-17 10:34 · Score: 5, Interesting

Your 300 x 10GB ports on 50 Servers is ... not efficient. Additionally, you're not likely saturating your 60GB off a single server, and you're running those six 10GB connections per server to try to eliminate other issues you have, without understanding them. You're speed issues are elsewhere (likely SAN or Database .. or both), and not in the 50 servers. In fact, you might be exasperating the problem.
BTW, our data center core is running twin 40GB connections for 80 GB total network load, but were not really seeing anything using 10GB off a single node yet, except the SAN. Our Metro Area Network links are is being upgraded to 10GB as we speak. The "network is slow" is not really an option.

--
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
Re:scalability? by Anonymous Coward · 2014-07-17 10:34 · Score: 3, Insightful

FTA: “This paper is not intended to show that you can build this in the world’s largest data centers today,” said Balakrishnan. “But the question as to whether a more scalable centralized system can be built, we think the answer is yes.”
Re:rfc1925.11 proves true, yet again by chuckugly · 2014-07-17 11:08 · Score: 5, Funny

In fact, you might be exasperating the problem.
I hate it when my problems get angry, it usually just exacerbates things.
Re:rfc1925.11 proves true, yet again by Anonymous Coward · 2014-07-17 13:03 · Score: 5, Informative

This is about zero in-plane queuing, not zero queuing. There is still a queue on each host, the advantage of this approach is obvious to anyone with knowledge of network theory (ie. not you). Once a packet enters an ethernet forwarding domain, there is very little you can do to re-order or cancel it. If you instead only send from a host when there is an uncongested path through the forwarding domain, you can reorder packets before they are sent, which allows, for example, to insert high-priority packets into the front of the queue, and bucket low priority traffic until there is a lull in the network.
Bandwidth is always limited at the highend. Technology and cost always limits the peak throughput of a fully cross-connected forwarding domain. That's why the entire internet isn't a 2 Billion way crossbar switch.
Furthermore, you can't install 6x 10-gigabit ports in a typical server, they just don't have that much PCIe bandwidth. You might also want to look at how much a 300 port 10GigE non-blocking switch really costs, multiply that up by 1000x to see how much it would cost Facebook to have a 300k node DC with those, and start to appreciate why they are looking at software approaches to optimise the bandwidth and latency of their networks with resources that are cost-effective, considering their network loads like everyone else's network loads never look like the theoretical worst-case of every node transmitting continuously to random other nodes.
Real network loads have shapes, and if you are able to understand those shapes, you can make considerable cost savings. It's called engineering, specifically traffic engineering.
-puddingpimp
Re:They re-invented static scheduling by postbigbang · 2014-07-17 13:25 · Score: 4, Informative

Nah. They put MPLS logic-- deterministic routing by knowing the domain into an algorithm that optimizes time slots, too.
All the hosts are know, their time costs, and how much crap they jam into wires. It's pretty simple to typify what's going on, and where the packet parking lots are. If you have sufficient paths and bandwidth in and among the hosts, you resolve the bottlenecks.
This only works, however, if and when the domain of hosts has sufficient aggregate resources in terms of path availability among the hosts. Otherwise, it's the classic crossbar problem looking for a spot marked ooops, my algorithm falls apart when all paths are occupied.
Certainly it's nice to optimize and there's plenty of room for algorithms that know how to sieve the traffic. But traffic is random, and pathways limited. Defying the laws of physics will be difficult unless you control congestion in aggregate from applications where you can make the application become predictable. Only then, or you have a crossbar matrix, will there be no congestion. For any questions on this, look to the Van Jacobsen algorithms and what the telcos had to figure out, eons ago.

--
---- Teach Peace. It's Cheaper Than War.