Bufferbloat — the Submarine That's Sinking the Net
gottabeme writes "Jim Gettys, one of the original X Window System developers and editor of the HTTP/1.1 spec, has posted a series of articles on his blog detailing his research on the relatively unknown problem of bufferbloat. Bufferbloat is affecting the entire Internet, slowly worsening as RAM prices drop and buffers enlarge, and is causing latency and jitter to spike, especially for home broadband users. Unchecked, this problem may continue to deteriorate the usability of interactive applications like VOIP and gaming, and being so widespread, will take years of engineering and education efforts to resolve. Being like 'frogs in heating water,' few people are even aware of the problem. Can bufferbloat be fixed before the Internet and 3G networks become nearly unusable for interactive apps?"
For what it's worth, TFS seems to be linking into the middle of the story, so maybe that's part of my problem. Still, it's really annoying to be told about this new problem with new jargon word, that's going to make the sky fall any day now, without knowing just what the hell it is.
The previous article seems to explain things a little better: http://gettys.wordpress.com/2010/12/03/introducing-the-criminal-mastermind-bufferbloat/
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
Several issues:
1. People who aren't networking engineers don't know about QoS, or don't know/want to know how to configure it.
2. QoS used that way is a hack to work around an issue that doesn't have to be there in the first place
3. How do you determine the maximum throughput? It's not necessarily the official line's speed. The nice thing about TCP is that it's supposed to figure out on its own how much bandwidth there is. You're proposing a regression to having to tell the system by hand.
4. QoS is most effective on stuff you're sending, but in the current consumer-oriented internet most people download a lot more than they upload.
What Jim is saying is that TCP flows try to train themselves to the dynamically available bandwidth, such that there is a minimum of dropped packets, retransmits, etc.
But in order for TCP to do this, packets must be dropped _fast_.
When TCP was designed, the assumptions about the price of ram (and thus, the amount of onboard memory in all the devices in the virtual circuit) were different -- namely, buffers were going to be smaller, fill up faster, and send "i'm full" messages backwards much sooner.
What the experimentation has determined is that many network devices will buffer 1 megabyte or MORE of traffic before finally dropping something and telling the tcp originator to slow down. And yet with a 1 meg buffer and a rate of 1 megabyte per second.. it will take 1 second simply to drain the buffer.
The pervasive presence of large buffers all along the tcp vc, and the non-speified or tail-drop drop behavior of these large queues means that tcp's ability to rate limit is effectively nullified, and in situations where the link is highly utilized, many degenerate behaviors occur, such that the overall link has extremely high latency, and that bulk traffic will cause interesting traffic to be randomly dropped.
Personally, I used pf/altQ on openBSD to try and manage this somewhat.. but its a dicey business.
My opinions are my own, and do not necessarily represent those of my employer.
Latency is bad? Bigger buffers = more latency?
Buffers increasing latency is not exactly a new phenomena. Its been observed and taken into design considerations for quite some time. For example back-in-the-day serial chips essentially had a buffer of one byte. The CPU fed data one byte at a time as the buffer became available and latency was pretty low since data was immediately transmitted. As more capable serial chips became available larger buffers were introduced. A newer chip may have a larger buffer but it may also not transmit data as soon as it has a single byte. It was common to have two programmable thresholds to begin a data transmission, (1) when a certain amount of data has accumulated in the buffer or (2) when a certain amount of time has elapsed. So if a "packet" to transmit was small enough it may sit in the buffer until (2), hence more latency with larger buffers. Software that cared generally began to issue flush commands to cause anything in the buffer to be sent immediately.
Network cards and/or the operating system may try to similarly accumulate data before transmitting a packet.
I'll attempt to translate.
TCP has to be able to estimate how fast* it can send data, because there's no way it can know definitively the link speed, capacity, and reliability between your system and a remote system. It does this by progressively getting faster until it starts detecting transmission problems between the two systems, at which point it backs off and slows down. Ideally, you hit a nice equilibrium at some point.
On a proper network, if some router along the path is at capacity, either internally, or along one of its outgoing paths, it should drop the packets it can't handle in a timely fashion. This seems counterintuitive at first, but remember that TCP handles the guaranteed transmission already - it will retransmit packets that didn't arrive. If the router is holding these packets in a buffer, and sending them along once the links clear up, i.e. "when it gets around to it", the packets will reach their destination with hugely inflated latency. This in turn confuses TCP, as it can't get a reliable estimate of link capacity, and the whole speed negotiation falls apart. The latency becomes wild and unpredictable as packets are sometimes buffered, sometimes not, but they always reach their destination, so TCP thinks it's sending at an acceptable rate. So now you've got all the endpoints conversing through this router that's claiming, "No problem, I can handle it!" when it really can't, and the problem just compounds itself as the router gets slammed harder and harder.
By getting timely notification of dropped packets, TCP can say, "Oh, I'm transmitting too fast for this link, time to shrink the sliding window and slow down." This both smooths out latency, and minimizes further dropped packets, not just for the two hosts involved, but for everyone else transmitting through the affected routes as well. This is how it's supposed to work, but excessive buffering of packets within routers prevents it from happening.
Moral: Dropped packets are perfectly normal and in fact required for TCP to manage its own speed and latency. Stop trying to buffer and guarantee packet delivery - TCP is handling that already.
(Disclaimer: I'm a DBA, not a network engineer. Feel free to clarify or correct anything I've mucked up.)
* "Fast" in this case means "How many packets should I send at once before stopping to wait for acknowledgment of those packets getting where they're going". "Faseter" equates to "more of them".
How much bandwidth can I have, though? Take the link between my desktop and a Slashdot server; is the correct answer "1GBit/s, no more" (speed of my network card)? Is is "20MBit/s, no more" (speed of my current Internet connection)? Is it "0.5MBit/s, no more" (my fair share of this office's Internet connection)? In practice, you need the answer to change rapidly, depending on network conditions - maybe I can have the full 20MBit/s if no-one else is using the Internet, maybe I should slow down briefly while someone else handles their e-mail.
TCP doesn't slam the network; it starts off slowly (TCP slow start currently sends just two packets initially), and gradually ramps up as it finds that packets aren't dropped. When packet drop happens, it realises that it's pushing too hard, and drops back. If there's been no packet drop for a while, it goes back to trying to ramp up. RFC 5681 talks about the gory details. It's possible (bar idiots with firewalls that block it) to use ECN (explicit congestion notification) instead of packet drop to indicate congestion, but the presence of people who think that ECN-enabled packets should be dropped (regardless of whether congestion has happened) means that you can't implement ECN on the wider Internet.
This works well in practice, given sane buffers; it dynamically shares the link bandwidth, without overflowing it. Bufferbloat destroys this, because TCP no longer gets the feedback it expects until the latency is immense. As a result, instead of sending typically 20MBit/s (assuming I'm the only user of the connection), and occasionally trying 20.01MBit/s, my TCP stack tries 20.01MBit/s, finds it works (thanks to the queue), speeds up to 20.10MBit/s, and still no failure, until it's trying to send at (say) 25MBit/s over a 20MBit/s bottleneck. Then packet loss kicks in, and brings it back down to 20MBit/s, but now the link latency is 5 seconds, not 5 milliseconds.
I appear to have a blog. Odd.