Slashdot Mirror


Bufferbloat — the Submarine That's Sinking the Net

gottabeme writes "Jim Gettys, one of the original X Window System developers and editor of the HTTP/1.1 spec, has posted a series of articles on his blog detailing his research on the relatively unknown problem of bufferbloat. Bufferbloat is affecting the entire Internet, slowly worsening as RAM prices drop and buffers enlarge, and is causing latency and jitter to spike, especially for home broadband users. Unchecked, this problem may continue to deteriorate the usability of interactive applications like VOIP and gaming, and being so widespread, will take years of engineering and education efforts to resolve. Being like 'frogs in heating water,' few people are even aware of the problem. Can bufferbloat be fixed before the Internet and 3G networks become nearly unusable for interactive apps?"

22 of 525 comments (clear)

  1. Definition, please by Megane · · Score: 5, Insightful

    I'm so glad the term has been defined so that I know what the hell we're talking about here. Oh wait, no it hasn't.

    Okay, then I'll RTFA. Oh wait, two screens worth of text later and it still hasn't.

    I'd like to change the topic now to the submarine that's sinking the English language: jargonbloat.

    --
    #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
    1. Re:Definition, please by Megane · · Score: 5, Informative

      For what it's worth, TFS seems to be linking into the middle of the story, so maybe that's part of my problem. Still, it's really annoying to be told about this new problem with new jargon word, that's going to make the sky fall any day now, without knowing just what the hell it is.

      The previous article seems to explain things a little better: http://gettys.wordpress.com/2010/12/03/introducing-the-criminal-mastermind-bufferbloat/

      --
      #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
    2. Re:Definition, please by Megane · · Score: 5, Insightful

      Actually, I blame the submitter. It is well known that Slashdot "editors" don't edit. They merely choose the least worthless articles out of the slush pile and push the button, sometimes using copy and paste to combine two similar submissions. Even my above link was still to the middle of the story, but it explains the core concept best.

      I also place a teensy bit of blame on the blogger, for not linking the first use of the word to the previous article. But he couldn't expect to get linked into the middle of the series.

      --
      #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
    3. Re:Definition, please by jg · · Score: 5, Insightful

      You asked, I just provided:

      http://gettys.wordpress.com/what-is-bufferbloat-anyway/

      Good question.

      Bufferbloat is the cause of much of the poor performance and human pain using today’s internet. It can be the cause of a form of congestion collapse of networks, though with slightly different symptoms than that of the 1986 NSFnet collapse. There have been arguments over the best terminology for the phenomena. Since that discussion reached no consensus on terminology, I invented a term that might best convey the sense of the problem. For the English language purists out there, formally, you are correct that “buffer bloat” or “buffer-bloat” would be more appropriate.

      I’ll take a stab at a formal definition:

      Bufferbloat is existence of excessively large (bloated) buffers into systems, particularly network communication systems.

      Systems suffering from bufferbloat will have bad latency under load under some or all circumstances, depending on if and where the bottleneck in the communication’s path exists. Bufferbloat encourages congestion of networks; bufferbloat destroys congestion avoidance in transport protocols such as HTTP, TCP, Bittorrent, etc. Without active queue management, these bloated buffers will fill, and stay full.

      More subtlety, poor latency, besides being painful to users, can cause complete failure of applications and/or networks, and extremely aggravated people suffering with them.

      Bufferbloat is seldom detected during the design and implementations of systems as engineers are methodical people, seldom if ever test latency under load systematically, and today’s memory is so cheap buffers are often added without thought of the consequences, where it can be hidden in many different parts of network systems.

      You see manifestations of bufferbloat today in your operating systems, your home network, your broadband connections, possibly your ISP’s and corporate networks, at busy conference wireless networks, and on 3G networks.

      Bufferbloat is a mistake we’ve all made together.

      We’re all Bozos on This Bus.

    4. Re:Definition, please by davidbrit2 · · Score: 5, Informative

      I'll attempt to translate.

      TCP has to be able to estimate how fast* it can send data, because there's no way it can know definitively the link speed, capacity, and reliability between your system and a remote system. It does this by progressively getting faster until it starts detecting transmission problems between the two systems, at which point it backs off and slows down. Ideally, you hit a nice equilibrium at some point.

      On a proper network, if some router along the path is at capacity, either internally, or along one of its outgoing paths, it should drop the packets it can't handle in a timely fashion. This seems counterintuitive at first, but remember that TCP handles the guaranteed transmission already - it will retransmit packets that didn't arrive. If the router is holding these packets in a buffer, and sending them along once the links clear up, i.e. "when it gets around to it", the packets will reach their destination with hugely inflated latency. This in turn confuses TCP, as it can't get a reliable estimate of link capacity, and the whole speed negotiation falls apart. The latency becomes wild and unpredictable as packets are sometimes buffered, sometimes not, but they always reach their destination, so TCP thinks it's sending at an acceptable rate. So now you've got all the endpoints conversing through this router that's claiming, "No problem, I can handle it!" when it really can't, and the problem just compounds itself as the router gets slammed harder and harder.

      By getting timely notification of dropped packets, TCP can say, "Oh, I'm transmitting too fast for this link, time to shrink the sliding window and slow down." This both smooths out latency, and minimizes further dropped packets, not just for the two hosts involved, but for everyone else transmitting through the affected routes as well. This is how it's supposed to work, but excessive buffering of packets within routers prevents it from happening.

      Moral: Dropped packets are perfectly normal and in fact required for TCP to manage its own speed and latency. Stop trying to buffer and guarantee packet delivery - TCP is handling that already.

      (Disclaimer: I'm a DBA, not a network engineer. Feel free to clarify or correct anything I've mucked up.)

      * "Fast" in this case means "How many packets should I send at once before stopping to wait for acknowledgment of those packets getting where they're going". "Faseter" equates to "more of them".

  2. Awsum, TTY in your name by cerberusss · · Score: 5, Funny

    Jim Getty, one of the original X Window System developers and editor of the HTTP/1.1 spec

    I'd murder four people just to have TTY in my name. Five if I could capitalize them, and postfix with a number. I'd name my son Dev.

    You'd get a business card with something like Dev GeTTY1, Armadillo Avenue 64, Seattle, Washington

    --
    8 of 13 people found this answer helpful. Did you?
  3. First link in the first article by mangu · · Score: 5, Insightful

    Just start RTFAing: "In my last post I outlined the general bufferbloat problem."

    Follow the link:

    "Each of these initial experiments were been designed to clearly demonstrate a now very common problem: excessive buffering in a network path. I call this bufferbloat

  4. pegged connection == latency, who'd of thunk it? by Shakrai · · Score: 5, Insightful

    I read TFA and I'm not seeing the problem. He can't duplicate this issue unless he maxes out his connection and then his latency goes to hell. No shit Sherlock, that's what happens when your pipe is full and the packets have to wait in the queue to be transmitted. Am I stupid or could he avoid this issue entirely by using QoS and/or rate-limiting his connection to some amount <100% of it's maximum throughout? I have QoS at the office that keeps our connection from pegging (it's limited to around 75% on the download and 90% on upload) and have never once encountered an issue with latency or jitter. At home I only throttle the upload (to 90% of maximum) and have successfully ran VPNs, bittorrent uploads and VoIP calls all at the same time without any headaches.

    Really, what's the problem here?

    --
    I want peace on earth and goodwill toward man.
    We are the United States Government! We don't do that sort of thing.
  5. So, let me get this straight... by CFD339 · · Score: 5, Insightful

    RAM is cheap.
    High speed uplink is not cheap.
    Peering agreements are manipulative, expensive, and sometimes extortionate.

    So...

    The poorly designed, poorly peered, under allocated back haul links can't handle the traffic that routers want to push through them -- but since RAM is cheap, operators just add RAM to the buffers so that when those back-haul lines slow down for a second the packets can get pushed through.

    And we're blaming the buffer for the problem?

    --
    The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
    1. Re:So, let me get this straight... by phantomcircuit · · Score: 5, Insightful

      TCP assumes that packets will do dropped when there is congestion, if they aren't the congestion control algorithms fail (hard).

  6. Re:pegged connection == latency, who'd of thunk it by vadim_t · · Score: 5, Informative

    Several issues:

    1. People who aren't networking engineers don't know about QoS, or don't know/want to know how to configure it.
    2. QoS used that way is a hack to work around an issue that doesn't have to be there in the first place
    3. How do you determine the maximum throughput? It's not necessarily the official line's speed. The nice thing about TCP is that it's supposed to figure out on its own how much bandwidth there is. You're proposing a regression to having to tell the system by hand.
    4. QoS is most effective on stuff you're sending, but in the current consumer-oriented internet most people download a lot more than they upload.

  7. You have have not RTFA or not UTFA.. by bmajik · · Score: 5, Informative

    What Jim is saying is that TCP flows try to train themselves to the dynamically available bandwidth, such that there is a minimum of dropped packets, retransmits, etc.

    But in order for TCP to do this, packets must be dropped _fast_.

    When TCP was designed, the assumptions about the price of ram (and thus, the amount of onboard memory in all the devices in the virtual circuit) were different -- namely, buffers were going to be smaller, fill up faster, and send "i'm full" messages backwards much sooner.

    What the experimentation has determined is that many network devices will buffer 1 megabyte or MORE of traffic before finally dropping something and telling the tcp originator to slow down. And yet with a 1 meg buffer and a rate of 1 megabyte per second.. it will take 1 second simply to drain the buffer.

    The pervasive presence of large buffers all along the tcp vc, and the non-speified or tail-drop drop behavior of these large queues means that tcp's ability to rate limit is effectively nullified, and in situations where the link is highly utilized, many degenerate behaviors occur, such that the overall link has extremely high latency, and that bulk traffic will cause interesting traffic to be randomly dropped.

    Personally, I used pf/altQ on openBSD to try and manage this somewhat.. but its a dicey business.

    --
    My opinions are my own, and do not necessarily represent those of my employer.
  8. Re:pegged connection == latency, who'd of thunk it by TheThiefMaster · · Score: 5, Interesting

    As an extreme example, say you request a 1GB file from a download site. That site has a monster internet connection, and manages to transmit the entire file in 1 second. The file makes it to the ISP at that speed, who then buffers the packets for slow transmission over your ADSL link, which will take 1 hour. During that time you try to browse the web, and your PC tries to do a dns lookup. The request goes out ok, but the response gets added to the buffer on the ISP side of your internet connection, so you won't get it until your original transfer completes. How's 1 hour for latency?

    The situation is only not that bad because:
    A: Most download sites serve so many people at once and/or rate limit so they won't saturate most peoples' connections
    B: Most buffers in network hardware are still quite small

  9. Re:I think buffers are a good thing by Coriolis · · Score: 5, Interesting

    He's not arguing against application-level caching. He's saying that too much caching at the IP layer is confusing TCP's algorithm for deciding how fast the link between two points is. This in turn causes massive variability in how fast the data can be downloaded; or in your terms, how fast the video can be buffered (and, in fact, how much buffer the video player needs).

    --
    Rgasuya aata! : I have been coding Perl and cannot tell where my fingers are now!
  10. Re:Looks like a hype by ledow · · Score: 5, Insightful

    You haven't read the article (or the many others around on LWN.net on the same topic). Basically, large buffers in networking gear, from DSL routers on your home network through to ISP's, mean that interactivity is *shite*. You might download Gb's but in terms of interactive applications it's useless and we're facing ever-increasing latency and problems through wanting to cope too much with errors and delays (e.g. huge buffers to keep resending instead of just letting packets drop and having TCP sort it out by retransmission). TCP windows never shrink because errors and buffered and retried so much from intermediate devices that any sort of window scaling is worthless because it doesn't *see* any packet-loss.

    Same devices, smaller buffers, everything works fine and "faster" / "more responsive" all around. It actually would *save* money on new devices because you don't need some huge artificial buffer, you can just drop the occasional packet. But the problem is so deeply embedded into run-of-the-mill hardware that it's almost impossible to escape at the moment and thus EVERYONE from large businesses to home users are running on a completely sub-optimal setup because of it. Almost every networking device made in the last few years has buffers so large that they cause problems with interactivity, bandwidth control, QoS, etc. It's NOT just that a "faster connection" solves the problem - we are getting a percentage of optimal service that's steadily decreasing as buffers increase even though we're improving all the time. That's the point. And it *is* caused by memory prices because memory is so cheap that a huge thoughtless buffer costs no more than a tiny, thought-out buffer.

  11. Re:Concerning Boiled Frogs by TheRaven64 · · Score: 5, Funny

    Only if you use a real frog. You can kill a hypothetical frog in this way.

    --
    I am TheRaven on Soylent News
  12. Re:QoS by Shakrai · · Score: 5, Interesting

    Given that most traffic on a domestic connection is incoming, that doesn't help much.

    It's not that hard to shape downstream traffic. Take a Linux router with two ethernet cards. eth0 is the LAN and eth1 is the internet. You shape eth0 with a maximum throughput of 75%-80% of your line speed. All of the downstream traffic has to go out on that interface so that's your opportunity to shape it. I do this at work and successfully share a 3.0mbit/s connection with 60+ employees. We use latency sensitive services like VoIP and RDP alongside streaming video and other large downloads without any major hassles. It stinks to lose some of your bandwidth because of this (you have to shape it to a number less than 100% of your line speed, otherwise buffering occurs at your ISP and your QoS scheme is defeated) but I'll take responsiveness over throughout any day of the week.

    --
    I want peace on earth and goodwill toward man.
    We are the United States Government! We don't do that sort of thing.
  13. Yes, buffers can introduce latency by perpenso · · Score: 5, Informative

    Latency is bad? Bigger buffers = more latency?

    Buffers increasing latency is not exactly a new phenomena. Its been observed and taken into design considerations for quite some time. For example back-in-the-day serial chips essentially had a buffer of one byte. The CPU fed data one byte at a time as the buffer became available and latency was pretty low since data was immediately transmitted. As more capable serial chips became available larger buffers were introduced. A newer chip may have a larger buffer but it may also not transmit data as soon as it has a single byte. It was common to have two programmable thresholds to begin a data transmission, (1) when a certain amount of data has accumulated in the buffer or (2) when a certain amount of time has elapsed. So if a "packet" to transmit was small enough it may sit in the buffer until (2), hence more latency with larger buffers. Software that cared generally began to issue flush commands to cause anything in the buffer to be sent immediately.

    Network cards and/or the operating system may try to similarly accumulate data before transmitting a packet.

    1. Re:Yes, buffers can introduce latency by GooberToo · · Score: 5, Insightful

      It doesn't help that massive numbers of people actively insist on breaking protocols which specifically exist to alleviate some of these types of problems.

      Far too many people ignorantly block all ICMP traffic. As a result, the network path in between the two communicating hosts are forced to buffer more data as the destination host becomes saturated. Worse, this type of filtering has a tendency to quickly compound, which in turn creates the exact type of bufferbloat he's describing.

      I wish people would understand there is a difference between, "No route to host", and a black hole. When you find a black hole, chances are really good you've found a host. As such, purposely breaking protocols for people to have an imagined increase in security only breaks the Internet as a whole when it becomes a wide spread tactic. And before people start rattling off that it opens a whole new can of worms, please realize that unlike in the past, stateful firewalls are extremely common today - so no.

  14. Jim Gettys did the world a great service with this by iwbcman · · Score: 5, Interesting

    I discovered this series of blog posts about 2 months ago, when he accidentally published one of his blog posts prematurely. I started reading it and followed the links and saw that this was a like a sleuth tale-if I had started reading this with his very first blog on the topic I would have had no idea where he was going with this. Now as to why this contribution by Jim Gettys does the world a great service:
    • Gettys is not pointing fingers at someone. The problem he is describing is truly vast, and involves lots of different people in different industries(router manufactures, ISP's, kernel driver authors, carrier grade network manufactures, etc.) with, presumably, a myriad of different intentions. The problem has been building over a long time-this didn't start yesterday, and won't be solved in a short time span, without a concerted effort on the part of everyone involved in all of these divergent industries, who often have quite divergent interests.
    • This approach that Gettys takes allows him to describe a problem which confronts everyone. By taking the high road and not pointing fingers he is able address an issue in such a way that a lot of the people who did contribute to this problem can recognize what they have done and own it, without being labelled, accused or feeling attacked. This should be a lesson to anyone who really wants to redress an issue that effects everyone.
    • Gettys develops this theme over many, many blog posts. It makes for some of the best internet reading I have experienced in years. Things only gradually become clearer-not merely what the problem is, but also all of the issues involved in it. I can read away in the internet for months at a time and not learn as much as I did by reading this series of posts.
    • Gettys knows what he is talking about. He developed this theme by talking with lots of experts -engineers at the ISP, people who played a pivotal role in the creation of the www and network specialists. He himself is not a network specialist, but he went out and met with people to discuss his findings and took clues and information from these exchanges to inform him and his quest to find out what was going on.
    • The series is short on answers. It may prove frustrating to many that he offers so little in the way of solutions to this problem. But this this due to the fact that the problem cannot be resolved by you, the end user. To solve this problem means rearchitecting countless millions of devices and altering hundreds of thousands of lines of code in multiple OS's.
    • Failure to redress this problem means that every effort to decrease latency by upping available bandwidth or upgrading network infrastructure will fail to deliver. If packets are not dropped fast, due to excessive buffering, the negotiation process fails, which invariably means congestion, which means latency-only something that addresses this issue has any chance of actually effecting change. Saying that this problem is just an issue already solved by QOS show that you don't understand the problem.
    • One of the first thoughts I had reading this was: if the techs on wallstreet read this article they will inevitably exploit this issue to win precious milliseconds on the stock exchange-ring a bell?
    • Any ISP could exploit this issue to offer a relative market advantage. Sadly when resolving an issue is in everybody's interest, market players will exploit the issue for their own relative gain. Getting everyone to actually tackle this is going to a gargantuan task.

    Hats of to Jim Gettys. Thanks for your service.

  15. Re:Concerning Boiled Frogs by zm · · Score: 5, Funny

    Use a lid.

    --
    Sig ?
  16. Things change at large scale by farnz · · Score: 5, Informative

    How much bandwidth can I have, though? Take the link between my desktop and a Slashdot server; is the correct answer "1GBit/s, no more" (speed of my network card)? Is is "20MBit/s, no more" (speed of my current Internet connection)? Is it "0.5MBit/s, no more" (my fair share of this office's Internet connection)? In practice, you need the answer to change rapidly, depending on network conditions - maybe I can have the full 20MBit/s if no-one else is using the Internet, maybe I should slow down briefly while someone else handles their e-mail.

    TCP doesn't slam the network; it starts off slowly (TCP slow start currently sends just two packets initially), and gradually ramps up as it finds that packets aren't dropped. When packet drop happens, it realises that it's pushing too hard, and drops back. If there's been no packet drop for a while, it goes back to trying to ramp up. RFC 5681 talks about the gory details. It's possible (bar idiots with firewalls that block it) to use ECN (explicit congestion notification) instead of packet drop to indicate congestion, but the presence of people who think that ECN-enabled packets should be dropped (regardless of whether congestion has happened) means that you can't implement ECN on the wider Internet.

    This works well in practice, given sane buffers; it dynamically shares the link bandwidth, without overflowing it. Bufferbloat destroys this, because TCP no longer gets the feedback it expects until the latency is immense. As a result, instead of sending typically 20MBit/s (assuming I'm the only user of the connection), and occasionally trying 20.01MBit/s, my TCP stack tries 20.01MBit/s, finds it works (thanks to the queue), speeds up to 20.10MBit/s, and still no failure, until it's trying to send at (say) 25MBit/s over a 20MBit/s bottleneck. Then packet loss kicks in, and brings it back down to 20MBit/s, but now the link latency is 5 seconds, not 5 milliseconds.