ARPANET Co-Founder Calls for Flow Management
An anonymous reader writes "Lawrence Roberts, co-founder of ARPANET and inventor of packet switching, today published an article in which he claims to solve the congestion control problem on the Internet. Roberts says, contrary to popular belief, the problem with congestion is the networks, not Transmission Control Protocol (TCP). Rather than overhaul TCP, he says, we need to deploy flow management, and selectively discard no more than one packet per TCP cycle. Flow management is the only alternative to peering into everyone's network, he says, and it's the only way to fairly distribute Internet capacity."
He seems to agree. This surprised me but it seems that equipment can do this fairly.
Oh, from his company of course.
Larry Roberts was co-founder of the ARPAnet, but he did NOT invent packet switching. That invention goes to Donald Davies of the National Physical Laboratory in the UK. His work was well-credited by the ARPAnet designers.
The problem is, some people will start throwing away 2 packets instead of 1 so that they can get more "throughput" on more limited hardware. Someone else will compete by tossing 3, and the arms race for data degradation will begin.
Will this method really offset the retransmits it triggers? Only if not everyone does it, unless I'm missing something.
What might work better is scaled drops: if a router and its immediate peers are nearing capacity, they start to drop a packet per cycle, automatically causing the routers at their perimeter to route around the problem, easing up on their traffic.
It still seems like a system where an untrusted party could take advantage to drop packets in this manner from non-preferred sources or to non-preferred destinations however.
'I'm not big on networking but if I'm sending data to someone and some "flow management" dumps one of the packets, won't my computer or modem just resend it?'
Yes and when the retransmission occurs the router may be able to handle your packet. The router won't be overloaded forever after all.
The bigger part of the equation is that with TCP the more packets are dropped the slower you transmit packets. With this solution the heaviest transmissions would have more packets dropped and therefore be slowed down the most.
I admit, I'd have to check the details of the protocol to see if this is open to abuse by those with a modified TCP stack. The problem is that the packets are dropped in a predictable manner and a modified TCP stack could be designed to 'filter' the noise and yet still degrade when other packets are lost and provide a reliable connection.
Overhead. Right now, routers just track individual packets: receive a packet, look up the next-hop IP in the forwarding table (which might have 250,000 entries), and send it on its merry way. To do anything based on flows, routers would have to keep track of all the active flows, which amounts to all open TCP connections going through that router. For an active router, there would be millions of active flows at any one time, so the overhead would be huge. This would be like a NAT or stateful firewall device that could do line-rate forwarding at gigabit, 10G, or 100G port speeds.
You also have problems tracking flows; routes change, so while a router may be tracking an active flow, the flow may choose another path. The router has no way of knowing this, so it has to keep track of the flow until it times out (and the timeout would have to be more than just a few seconds).
There are flow-based router architectures, but they are not generally used for ISP core/edge routers because there are too many ways they can break.
Can someone explain why this hasn't already been implemented?
It has been implemented and abandoned already because it doesn't scale. Serious routers today use the concept of interface adjacency: for a given inbound packet there are only a few possible destinations: they are each of the interfaces on the router.
When a route is installed into the FIB, you can recursively follow that route until you find the egress interface and the layer 2 address of the next hop - those will typically never change! So long as the router always keeps this adjacency information up to date, individual packets never need to have a route lookup performed - the destination prefix is checked in the adjacency table, the layer 2 header is rewritten, and the packet is queued for egress on the appropriate interface.
This allows for substantially higher throughput (in packets per second) than other methods because the adjacency table can be cleverly stored in content-addressable memory that provides constant time answers. A prefix will be installed in a content-addressable memory circuit as a lookup key. The value associated with that key is a pointer into the adjacency table that holds the interface and layer 2 information for that prefix.
By reconsidering the routing problem, and by using some smart circuits, the route lookup for a single packet has been reduced from O(k) to O(1), where k is the length of the longest prefix. For IPv4, that's up to 32-bits - so that means you do a single fetch and lookup instead of 32 or so comparisons for each packet. At a million packets per second, that's a huge difference.
Traditional flow-based routing requires creating in-memory structures for each flow, collectively called the flow cache. Each packet requires an initial full route lookup, which builds the structures for that flow. Then, subsequent packets in that flow can be matched against the cache and switched directly to the egress interface. This operation is much closer to that of a contemporary firewall. The good thing about this method is that it gives you a lot of visibility into the traffic. The bad side is that it requires a very large amount of memory for all of these structures. When that memory is exhausted, you can't route anymore flows!
This comparison is a bit apples to oranges - the adjacency table described above is pretty much state-of-the-art for off the shelf gear, while the flow cache architecture is highly dated. But without some substantial advances in the ways flows are created, tracked, and expired, no flow router is going to reach the number of packets per second that are required for very large installations in the Internet.
Everyone's got their favorite experts and they are often a shortcut to lots of research you don't have time for. He's an independent expert who cares more about your rights than other things, happens to be an expert in OS design who's been working since the early 70s and knows something about networking as well. Finally, he likes to answers email.
This actually looks like a form of something a lot of Cisco equipment already does to prevent "synchronization."
Let's say you have 500 hosts sharing a "fat pipe." If During peak times, the combined throughput used by TCP applications cause all available bandwidth on the link to be consumed. The result is, at that instant that all available bandwith is consumed, packets get dropped suddenly and indiscriminately. This means that 500 hosts all lose a slew of packets.
Per TCP specifications, when packets aren't acknowledged, all 500 hosts back off for a moment, and then retransmit at approximately the same time, causing another sudden burst in bandwidth usage, and more dropped packets.
This problem compounds until all hosts are simply busting packets, dropping packets, backing off, and repeating. The solution to this was a technique called "RED (Random Early Detection).
What this does is essentially detect when bandwidth is almost completely utilized, and then starts selectively and "fairly" dropping packets from the TCP streams. This causes the hosts to gradually back off, until bandwidth consumption is back in check. The result is that the whole "synchronization" issue is avoided, and the link is better utilized, as throughput is constant and reliable.
There is a variation called WRED or "Weighted Random Early Detection", in which certain types of packets get cut before others. This would allow the router to avoid dropping VoIP traffic, while implementing RED on non-realtime streams instead.
You can read more about this technique here: http://www.cisco.com/en/US/docs/ios/12_0/qos/configuration/guide/qcconavd.html