Bufferbloat: Dark Buffers In the Internet
Expanding on earlier work from Jim Gettys of Bell Labs with a new article in the ACM Queue, CowboyRobot writes that Gettys "makes the case that the Internet is in danger of collapse due to 'bufferbloat,' 'the existence of excessively large and frequently full buffers inside the network.' Part of the blame is due to overbuffering; in an effort to protect ourselves we make things worse. But the problem runs deeper than that. Gettys' solution is AQM (active queue management) which is not deployed as widely as it should be. 'We are flying on an Internet airplane in which we are constantly swapping the wings, the engines, and the fuselage, with most of the cockpit instruments removed but only a few new instruments reinstalled. It crashed before; will it crash again?'"
Cingely has been writing about this all year. He cites Jim Gettys too. See: http://www.cringely.com/tag/bufferbloat/
I paid the going retail price for a Windows screen reader and got a free Unix computer!
To configure your active queue management, the first thing I need to know is: do you have a push system, or a pull system?
Neither, sir, we have a suck system.
You never want a full buffer. At that point, it ceases to do its job.
Seems so, but isn't. For TCP traffic, a shallow buffer that drops traffic will result in more goodput than a deep buffer. Which is the point.
Someone had to do it.
That is actually the exact problem. You do not want buffers larger than the flight time of your circuit. You absolutely want the buffers to fill and drop packets otherwise.
a handful of selfish greedy people are no match for millions of selfish, greedy people -u4ya
Seems so, but isn't. For TCP traffic, a shallow buffer that drops traffic will result in more goodput than a deep buffer. Which is the point.
Yes and no...
If you don't (or only rarely) fill your buffer, a smaller buffer introduces less latency than a large one, while still allowing you to maximize throughput. If, however, you usually have your buffer full, you increase latency for literally no benefit, since you've already maximized throughput simply through resource demand.
The former will occur when your average load falls below your actual bandwidth, and allows you to get the most out of your link. The latter occurs when you consistently exceed your bandwidth, in which situation you may as well not even have a buffer, because it only increases latency without increasing throughput. That describes TFA's real point.
What he suggests amounts to actively choosing between those two conditions - If your average demand falls below your link speed, a larger buffer will help smooth the load over time. If, however, your average demand exceeds your link speed, throw away the buffer because it doesn't help.
But as per the GP's point - If you have an always-full buffer, you literally gain nothing but latency.
The problem with buffers is most all of the time they are configured by size in bits. They need to be sized based on bit flight time of the circuit, which is in delay ms times throughput in bits. The disconnect between those values is a problem in *either* direction, especially past the retransmit threshold on the above side.
Buffers should be dynamicly sized based on flight time of data on the specifc link, and ideally kept updated. WRED is also highly suggested.
What really exacerbates the issue is devices with buffers that must be the same size for all links on X (be it card, slot, or chassis).
a handful of selfish greedy people are no match for millions of selfish, greedy people -u4ya
What he suggests amounts to actively choosing between those two conditions - If your average demand falls below your link speed, a larger buffer will help smooth the load over time.
That's a pretty simplified way of putting it, but basically correct. Major equipment vendors have been slow to adopt more advanced queuing strategies (Stochastic Fair Queuing integrated with some of the more advanced flavors of early discard.) Fortunately we're budgeted for and piloting a shaper for purchase soon, and this time around have a chance to get something both well supported and cutting edge.
Personally I pine for ATM's ABR CoS with it's fast end-to-end congestion notification, but as history has shown us, the inevitable fate of the tech world is for the inferior to be gradually, painfully, and kludgingly adapted to become the same thing as the technologies it displaced through lowballing. In this case, that inferior thing being IP/ethernet.
Someone had to do it.
Seems so, but isn't. For TCP traffic, a shallow buffer that drops traffic will result in more goodput than a deep buffer. Which is the point.
Exactly.
Early Congestion notification along with ONLY a minimal amount of client side buffering is really all you need.
The deep buffer just make it worse for everyone.
Oh, and And just as a Car Analogy is inappropriate to describe TCP traffic the Airplane Analogy is worse.
Sig Battery depleted. Reverting to safe mode.
Maybe posting a new article on an issue that was also an issue a year ago is not a "dupe", but an acceptable and possibly even normal thing for a news site to do?
Except it is an alarmist. The current situation isn't optimal but being optimal and having a critical issue are two different things. The crux of the problem is basically "Long delays from bufferbloat are frequently attributed incorrectly to network congestion, and this misinterpretation of the problem leads to the wrong solutions being proposed." That means is the administrators *might* mistake large buffer slow downs for other causes of network congestion. Idealy, it should definitely be dealt with better but it's hardly a collapse of the network.
A network buffer acts just as that, a buffer to smooth out traffic spikes. A buffer does this at the cost of latency. If a buffer is large AND consistently full, that means that network link is always being fully utilized to where a large buffer isn't needed which basically induces large latency on top of waiting for the link to clear for no benefits (the extra latency *may* confuse administrators is basically the "danger"). On the other hand, if the link is under utilized the majority of times, the a large buffer is beneficial to deal with spike traffic. The majority of networks are the latter and hence designed as such. Two solutions, get faster links or deal with it more intelligently.
A replacement for PATA or PCI has to interoperate only with other components in the same chassis, or possibly on the same desk in the case of eSATA and Thunderbolt. A replacement for TCP would have to interoperate with every other computer in the world. Imagine what a flag day that would be.
Feel free to try it out yourself...I have and the problem is real.
And just maybe some of us are interested in how research has progressed since the last article...
Democracy is a sheep and two wolves deciding what to have for lunch. Freedom is a well armed sheep contesting the issue
Each hop has its own buffer. Endpoints can fix their own buffers, but they can't do anything about buffering in the next hop. If something changes in the network to reduce the available bandwidth, the ideal behaviour is for packets to start getting dropped right away so that the originator gets notified of the drop and can slow itself down to compensate.
If some device in the core network just buffers up seconds worth of packets instead of droping them it destroys the ability of the sender to adapt to the changing conditions.
What we need is a ferry analogy.
Packet transmission is like a ferry, crossing a river at fixed intervals. But ferry sets off when it is full rather than at set times.
People wait at the shore and generally don't have to wait too long as the ferry is pretty fast and only needs a few people to fill up. For most people, walking onto the ferry involves very little waiting before the ferry actually departs and crosses the river.
Buffer bloat is when big buffers act like ferrys with huge capacity. People enter a huge 2000 passenger capacity boat, and are let on by their hundreds with seemingly no delay. But the ferry will not depart until it is reasonably full. So the people who got on first may have to wait for hours before the ferry actually departs and crosses the river.
It is clear that bigger ferries are no substitute for more ferries....or smaller rivers. Or possibly a bridge. In any case, you can get away without introducing cars or airplanes, so my job is done here.
May the Maths Be with you!
No, buffers are not for "rare peaks". They are fundamentally required due to the physics of data transmission from one device to another, especially when the link speeds are different from one hop to the next.
a handful of selfish greedy people are no match for millions of selfish, greedy people -u4ya
But that's the point, the buffers smooth the link, but not the streams going across them. At enough of a buffer bloat, the buffers actually make the link have to retransmit the same data multiple times due to the design of TCP congestion avoidance.
a handful of selfish greedy people are no match for millions of selfish, greedy people -u4ya
As soon as I start trying to shove (or suck) more bits through the pipe than it can handle, round trip latency to "nearby" points of the Internet increases from ~25 ms to ~1 second. When I need to transfer a lot of data, I use rsync or wget if at all possible, and throttle the transfer to just below the rate the connection can handle; this results in ping times staying sane while only slowing down the transfer slightly. We shouldn't need to resort to doing stuff like this to make the network function properly!
This analogy is like a bathtub, full of spiders, and on fire. It sounds dangerous, but it's self limiting.
“Common sense is not so common.” — Voltaire
The bad Bufferbloat setup is on the left (yellow dots), and the 'good' setup (i.e. how things used to be configured about 10-20 years ago when RAM was more expensive!) is on the right (cyan/blue dots).
Both sides start off okay, but notice how the left side 'queues' (tall yellow dot columns) keep on growing over time, while the right side blue columns stop short because of the small buffer size. As they stop short, some data 'packets' must be dropped, and this gets reported back to the upload site that it's shoving data to the user too fast. As a result, the upload site temporarily slows the sending of data, and thus the system self-corrects.
Meanwhile, on the left side, these packets of data never get dropped, so the giant bloated yellow buffers get filled more and more, but the computer at the upload site doesn't realise the carnage of these giant queues further down the line, and instead thinks "All is okay, let's keep sending data fast!".
Finally, when a smaller piece of data needs to be sent to the user (see 2:30+ signified by red dots on the left and dark blue dots on the right), the left side shows the red dots (which could be say, a small email) wading through giant queues to reach their destination, really slowly. Furthermore these tiny bits of data often need special 'emergency' treatment as they hold up other larger data associated with it. On the good right side, the dark blue dots have no such giant queues.
Why OpalCalc is the best Windows calc
That analogy doesn't quite do the trick. TCP windowing is a bit more sophisticated than that. You can think of it maybe as a commander sending couriers out to support a mobile squad through hostile territory. If too many of them never make it to the squad, or back, he sends them less frequently so they can sneak through more discretely. If the troops make it through then he sends them faster because the more ammo he can get through the better. But he also has to decide how many men to put on courier duty. If the couriers take too long the squad has obviously moved further away from the base, and if he waited for the next one to return, he wouldn't be sending enough ammo. If the couriers return quickly, he can make do with less couriers.
Big buffers are like a flimsy rope bridge in the courier's path that takes a long time to cross. Couriers have to wait on one side because only one can cross at once, but the large groups waiting at the side of the cliff is more likely to get attacked. Until they do get attacked, however, the commander starts to think the squad has moved very far away, so he puts more couriers on duty. Since he thinks the squad is far away, he is not expecting them to return for a longer amount of time, it takes him longer to realize that they are starting to go missing entirely.
One of the best solutions to this problem turns out to be for some of the couriers to randomly go AWOL, and for more of them to go AWOL the bigger the crowd at the rope bridge gets. This basic concept is called Random Early Discard, and there have been a lot of ways invented for deciding who goes AWOL and why. If some of the couriers go AWOL, the commander thinks they are being attacked, so he slows down and also takes some troops off courier duty.
Someone had to do it.
If you look at buffers allocated to fast multi-gigabit interfaces at the core of the network they are simply not large enough compared to forwarding rates involved to be able to induce the kinds of delays needed to cause Internet wide problems.
You can argue they may not be ideal for real time voice, game or video communication when these links are oversubscribed but no doomsday is possible.
Today buffer bloat effects are mostly observed at the edge even though they need not always be.
Failure of a congestion control algorithm to control link saturation does not translate into congestive collapse of the larger network. It just results in *your* network connection turning to shit. When netalyzer runs it intentionally saturates your link at that time. In the real world only a few portions of the edge are ever saturated to the extent congestion control failure becomes an issue leading to more packets through core routers. The number of edge machines in this category would need to be significant to cause a rerun of previous issues.
That condition can not be met due to self feedbacks. If everyone maxed their pipes at once the core would saturate self-limiting edge saturation due to gross over-provisioning of available edge bandwidth in relation to core bandwidth which would ensure congestion control algorithms function properly.
I'm not arguing there is not a problem or more can't be done. I'm just arguing the doomsday congestive collapse scenario is bullshit.
That's a pretty simplified way of putting it, but basically correct. Major equipment vendors have been slow to adopt more advanced queuing strategies (Stochastic Fair Queuing integrated with some of the more advanced flavors of early discard.)
Right. The problem is not big buffers, per se. It's big dumb FIFO queues. There's nothing wrong with one big flow, like a file transfer, having a long latency, provided that other flows with less data in flight aren't stuck behind it. That's what "fair queuing" is all about. Each flow has its own queue, and the queues are serviced in a round-robin fashion. (With stochastic fair queuing, some hashing is done to eliminate some of the bookkeeping on flows, but the effect is roughly the same.)
I figured this out in the early 1980s (see RFC 970) and by the late 1990s, it was an established technology. We shouldn't be having this problem at this late date.
I wonder how much of the trouble comes from devices that are doing TCP-level processing in the middle of the network. Stateful firewalls and ISP ad-insertion engines can introduce substantial latency.
If you want to test for bad behavior, try running two flows, one that never has more than one packet outstanding, and one that just does a big file-transfer like operation like a download. If the latency of the low-traffic flow goes up to the same as that of the bulk flow, there's a big dumb buffer in the middle. If the packet loss rate of the low-traffic flow goes up, there's a small dumb buffer in the middle.
You don't necessarily have to size them in flight time of the circuit.
What you can do is have huge buffers, but just drop packets that are older than say 50 milliseconds since the time they entered the device (if the link/hop is supposed to be fast and low latency).
If the link is slow and/or high latency, you may wish to use higher values - 100 milliseconds. But not too high. I'm no networking expert but I don't really see the purpose of adding hundreds of milliseconds to a hop just to save a few packets that are likely to be dropped anyway, or should be dropped as an indirect signal that whoever is sending those packets should slow down.
Except that Cogent is the trouble instigator in every single instance. They are like the little trouble maker who runs around poking, kicking, pinching, insulting the larger kid at the playground, yet cries loudly and acts very hurt when he gets slapped in return.
That is actually the exact problem. You do not want buffers larger than the flight time of your circuit. You absolutely want the buffers to fill and drop packets otherwise.
You talkin' smack, fool? I will end you! I bloat like a buffer, sting like a TCP!
These posts express my own personal views, not those of my employer
The bloated or big buffers causing more latencies than necessary only if it is designed with a single queue for all flows. If each flow gets a queue in the buffer and all queues are read and send out in round robin, the ping packet would not have to wait till the earlier started big file transfer which has completely filled the buffer would be through. The ping packet would practically overtake the large amount of queued bytes of the big file transfer instead of going behind it in a single queue.
You know, like, measuring things? Where does the problem happen? Under what circumstances?
You mean, like figure-2 or even better, figure-5, in TFA? Where the (most common) 2^n buffer sizes stand out so obviously in the data that you'd need to try not to notice the trend?
Of course, this situation doesn't actually require much "real" data to prove. If each 1500 byte packet takes 10ms to transmit, and you have a full 256KB buffer - Which will unavoidably happen any time you try to sustain a transmit faster than your link can handle - You will have 1.7 seconds of latency in a FIFO queue.
tldr
Don't worry, we could tell.
Early Congestion notification along with ONLY a minimal amount of client side buffering is really all you need.
Unfortunately, early notification doesn't work with a ton of wireless devices. Their drivers have minimal abilities to be controlled and they always send data at the speed of their negotiation. .eg if they connect at 11g they always send data at that speed and always send acks with window size adjustments to speed traffic up to that speed until they receive multiple window size adjustments telling them to STFD. Wireless devices are the dumbest things ever to be unleashed on the net and they are multiplying.
Having to work for a living is the root of all evil.
In any case, you can get away without introducing cars
The only ferries I've ever been on were for cars, so you kinda did. Maybe you should've used a paraglider analogy, I've never seen a car use a paraglider.
My webcomic
What can I do with my own laptop and wifi router to make my own situation better?
you sir, did not understand which buffers are being discussed buffering a movie in no way compares to tcp buffering
And the right balance between buffer size, drop percentage, and throughput should be measurable. But I bet those lazy bastards at cisco have never thought to measure performance, which is why no one uses their equipment.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
Umm ... your 1500 byte packet had best not take more than about 10 us to transmit. 10ms would be quite ridiculous.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
We're talking about 40+Gbit/sec internet backbones in this article, not end user connections.
:)
The entire first half of TFA talks about his 1Mbit WiFi connection. I care about my paltry 1.2Mbit intenet connection.
Now in fairness, the same problem does indeed apply at the backbone level... But I didn't call you "ridiculous".
You can have an overall congested network. I've seen this on occasion.
But it is very easy (and even more common) for you (or people in your house) to do it to you, than to have the overall ISP network congested. This is something a simple file copy can/does do to you, in practice.
Some ISP's run AQM properly (e.g. RED) in the cores of their networks; some do not. On the ones that do not, you'll see problems at peak hours. Similarly on corporate networks.