Bufferbloat: Dark Buffers In the Internet

Cringely again... by beetle496 · 2011-12-02 14:34 · Score: 4, Informative

Cingely has been writing about this all year. He cites Jim Gettys too. See: http://www.cringely.com/tag/bufferbloat/

--
I paid the going retail price for a Windows screen reader and got a free Unix computer!

Re:Cringely again... by hairyfeet · 2011-12-02 15:08 · Score: 4, Interesting

Uhhh...I just read the link but am a little confused. Even Cringley points out at the first of his article that originally TCP was written for a VASTLY different and weaker network than we have now, so instead of trying to make the networks go back to a mid 1980s design, wouldn't it be smarter just to update TCP to take advantage of new tech advances?
Now i'm not really a heavy network guy so excuse me if I put it in more of my lingo, but lets compare it to something I've got more first hand experience with, hard drives. If you don't write the controller code to take advantage of the large cache frankly the cache becomes worthless but if you DO write the controller code to take into account the size of the buffer it makes a BIG difference, so much so that I've seen 5400RPM drives whip 7200RPM drives simply by having better cache management.
So wouldn't the right way to go be to update TCP for the times? i mean we didn't slow computers down so we could keep PATA or PCI, we came up with new tech like SATA and PCIe to take advantage of the faster throughput. Shouldn't we do the same here as well?

--
ACs don't waste your time replying, your posts are never seen by me.
Re:Cringely again... by mellon · 2011-12-02 15:40 · Score: 5, Insightful

Buffer and cache are not the same thing. Packets are written to a buffer once and read from it once. Caches are useless if, on average, blocks aren't read from them more than they are written to them. So treating them as analogous is highly misleading.
The deal with throughput is that you can only win by storing packets if there is going to be room to send them without delay. If you buffer every packet that's sent, it does get delivered, but by the time it gets to its destination, it's too late. You can adjust the TCP algorithms to behave somewhat less badly in this situation, but what you can't do is get genuine flow control with big buffers, because the endpoints have no way to determine the throughput of the network.
The only way the endpoints can determine the throughput of the network is if packets get dropped when there's congestion. When packets don't get dropped, what you see is that whenever there is more traffic to send over the link than the link can hold, it just winds up in a buffer. Latency rises. Eventually all the senders give up. Then the buffers start to drain, and packets get delivered. Then the acks start coming back. Now the endpoints think they are on a high latency link, so they crank back up again and fill the buffers again.
So what you see is a network that works great as long as the total load presented to the network is less than the aggregate capacity of the network. As soon as the demand for bandwidth exceeds the supply, every single stream starts to stall. If you've stayed at a hotel recently, you've seen this: a dozen people try to watch video streams over a fairly wimpy connection, and then you can't do _anything_ over the connection, because the buffer fills up.
If you didn't have that giant buffer, all the endpoints would be able to tell that the link was congested, and would slow down. If the total available bandwidth wasn't enough, the video streams would basically fail, but you could still get mail and surf the web. But with bufferbloat, not only can't you watch video streams, you also can't surf the web or get email or ssh to your server.
You can see this by pinging a server somewhere out on the internet. When the link isn't congested, you'll see reasonable round trip times, typically 100ms. Then when it gets congested, you'll see packets dropped, and you'll see the RTT rise to as much as a minute. Then as all the senders notice that their packets aren't being delivered, they back off and suddenly the RTT starts to drop again, and you start to hope the network's been fixed. But it's fool's gold: as soon as the senders notice, they bomb the buffer again, and the RTT goes back up. Rinse and repeat until you give up.
You probably don't see this very often on your home link, because you probably aren't saturating it. But it happens a lot at Wifi hotspots in particular, and also sometimes on 3G networks. It's quite disheartening, particularly when you're paying for the connection. You also see it on big ISPs like Comcast when you try to reach content providers that aren't willing to pay the ransom to Comcast to get on their uncongested link.
Re:Cringely again... by m.dillon · 2011-12-02 16:41 · Score: 5, Interesting

Well, you definitely CAN tell when one or more buffers along the path begins to fill up, because latency increases. Packet loss is not necessary and, in fact, packet loss just makes the problem worse since many TCP connections implement SACK now and can keep the bandwidth saturated even in the face of packet loss.
The ideal behavior is probably not to start dropping packets immediately... eventually, sure, but definitely not immediately. Ideally what you want to do is to attempt to shift the problem closer to the edges of the network where it is easier to fairly apportion bandwidth between customers.
Send-side bandwidth limiting is very easy to implement since TCP already has a facility to collect latency information in the returned acks. I wrote a little beastie to do that in FreeBSD many years ago, and I turn it on in DragonFly releases by default.
The purpose of the feature is not to completely remove packet buffering from the network, because doing so would put the sending server at a severe disadvantage verses other servers that do not implement similar algorithms (which is most of them).
The purpose is to unload the buffers enough such that the algorithms in the edge routers aren't overloaded by the data and can do a better job apportioning bandwidth between streams.
Our little network runs this coupled with fair queueing in both directions... that is, we not only control the outgoing bandwidth, we also pipe all the incoming bandwidth through a well connected colo and control that too, before it runs over the terminal broadband links. This allows us to run FAIRQ in both direction in addition to reserving bandwidth for TCP acks and breaking down other services. FAIRQ always works much better when links are only modestly overloaded and not completely overloaded. Frankly we don't have much of a choice, we HAVE to do this because our last-leg broadband links are 100% saturated in both directions 24x7. Anything short of that and even a single video stream screws up the latency for other connections beyond hope.
This sort of solution works great near the edges.
For the center of the network, frankly, I think about the best that can be done is modest buffering and RED and then trying to reduce the load on the buffers in the center with algorithms run on the edges (that can sense end-to-end latency). The modest buffering is needed for the edge algorithms to be able to operate without bits of the network having to resort to dropping packets. In otherwords, you want the steady state load for the network to not have to drop packets. Dropping packets should be reserved for the case where the load changes too quickly for the nominal algorithms to react. That's my opinion anyhow.
-Matt
Re:Cringely again... by WaffleMonster · 2011-12-02 19:51 · Score: 4, Interesting

So wouldn't the right way to go be to update TCP for the times? i mean we didn't slow computers down so we could keep PATA or PCI, we came up with new tech like SATA and PCIe to take advantage of the faster throughput. Shouldn't we do the same here as well?
We have SCTP which was intended to replace TCP except nobody seems to care.
At the end of the day the concept of TCP is not rocket science - there is a limit and diminishing returns to what more can be done twoard making TCP a perfect reflection of the concept of TCP.
Congestion management and ack/windowing have certainly evolved into high arts..but fundementally all TCP does is implement a loss free ordered data stream on top of an unordered lossy packet switched network.
This means your core limitation is embedded in the definition of TCP itself...the problem of head-of-line blocking. By using TCP you are by definition limiting yourself to the constraints of TCP.
Realtime voice/video and multi-player games use their own protocols because they are not willing to accept the constraints of TCP. It is not the implementation of TCP that is holding them back. It is the *concept* of TCP.
In my opinion we need more IP protocols to better handle varied use cases more than we need a new TCP.
Re:Cringely again... by evilviper · 2011-12-02 22:13 · Score: 4, Interesting

Even Cringley points out at the first of his article that originally TCP was written for a VASTLY different and weaker network than we have now, so instead of trying to make the networks go back to a mid 1980s design, wouldn't it be smarter just to update TCP to take advantage of new tech advances?
There's nothing about a "weaker" network that necessitates a protocol redesign. TCP has had problems with congestion handling from day one, that have necessitated a million and one hacks and workarounds, because it stupidly conflates packet loss with congestion... Some links will have packet loss without any congestion, and others (like these with huge buffers) will have congestion without (immediate) packet loss. It was a bad design decision.
What's worse is that IP was designed correctly to begin with. The original design has ICMP control messages (eg. source-quench) to signal congestion, much like many other networking protocols. The real problem was that the specifics were vague, and there was no exact standard on how much to slow down, how it affects higher level protocols, etc., so it became a prisoner's dilemma, and highly unfair, and was deprecated.
Of course, this problem could occur with TCP's congestion control just as easily if any particular implementations reduced the rate of exponential backoff, so there's nothing fundamentally wrong with the original congestion control design, just the lack of consistent implementation.
Controlling congestion by dropping packets is like controlling freeway traffic by randomly pushing cars off the road with a bulldozer.

--
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant

Push or Pull by JoeMerchant · 2011-12-02 14:37 · Score: 5, Funny

To configure your active queue management, the first thing I need to know is: do you have a push system, or a pull system?

Neither, sir, we have a suck system.

Re:Is this a problem? by Anonymous Coward · 2011-12-02 14:38 · Score: 4, Insightful

You never want a full buffer. At that point, it ceases to do its job.

Re:Is this a problem? by skids · 2011-12-02 14:39 · Score: 4, Informative

Seems so, but isn't. For TCP traffic, a shallow buffer that drops traffic will result in more goodput than a deep buffer. Which is the point.

--
Someone had to do it.

Re:Is this a problem? by CyprusBlue113 · 2011-12-02 14:43 · Score: 5, Informative

That is actually the exact problem. You do not want buffers larger than the flight time of your circuit. You absolutely want the buffers to fill and drop packets otherwise.

--
a handful of selfish greedy people are no match for millions of selfish, greedy people -u4ya

Re:Is this a problem? by pla · 2011-12-02 15:00 · Score: 5, Informative

Seems so, but isn't. For TCP traffic, a shallow buffer that drops traffic will result in more goodput than a deep buffer. Which is the point.

Yes and no...

If you don't (or only rarely) fill your buffer, a smaller buffer introduces less latency than a large one, while still allowing you to maximize throughput. If, however, you usually have your buffer full, you increase latency for literally no benefit, since you've already maximized throughput simply through resource demand.

The former will occur when your average load falls below your actual bandwidth, and allows you to get the most out of your link. The latter occurs when you consistently exceed your bandwidth, in which situation you may as well not even have a buffer, because it only increases latency without increasing throughput. That describes TFA's real point.

What he suggests amounts to actively choosing between those two conditions - If your average demand falls below your link speed, a larger buffer will help smooth the load over time. If, however, your average demand exceeds your link speed, throw away the buffer because it doesn't help.

But as per the GP's point - If you have an always-full buffer, you literally gain nothing but latency.

Re:Is this a problem? by CyprusBlue113 · 2011-12-02 15:08 · Score: 5, Informative

The problem with buffers is most all of the time they are configured by size in bits. They need to be sized based on bit flight time of the circuit, which is in delay ms times throughput in bits. The disconnect between those values is a problem in *either* direction, especially past the retransmit threshold on the above side.

Buffers should be dynamicly sized based on flight time of data on the specifc link, and ideally kept updated. WRED is also highly suggested.

What really exacerbates the issue is devices with buffers that must be the same size for all links on X (be it card, slot, or chassis).

--
a handful of selfish greedy people are no match for millions of selfish, greedy people -u4ya

Re:Is this a problem? by icebike · 2011-12-02 15:22 · Score: 4, Informative

Seems so, but isn't. For TCP traffic, a shallow buffer that drops traffic will result in more goodput than a deep buffer. Which is the point.

Exactly.

Early Congestion notification along with ONLY a minimal amount of client side buffering is really all you need.
The deep buffer just make it worse for everyone.

Oh, and And just as a Car Analogy is inappropriate to describe TCP traffic the Airplane Analogy is worse.

--
Sig Battery depleted. Reverting to safe mode.

Re:SPAM by jibjibjib · 2011-12-02 15:22 · Score: 4

Maybe posting a new article on an issue that was also an issue a year ago is not a "dupe", but an acceptable and possibly even normal thing for a news site to do?

Re:Alarmism? by Anonymous Coward · 2011-12-02 15:25 · Score: 5, Informative

Except it is an alarmist. The current situation isn't optimal but being optimal and having a critical issue are two different things. The crux of the problem is basically "Long delays from bufferbloat are frequently attributed incorrectly to network congestion, and this misinterpretation of the problem leads to the wrong solutions being proposed." That means is the administrators *might* mistake large buffer slow downs for other causes of network congestion. Idealy, it should definitely be dealt with better but it's hardly a collapse of the network.

A network buffer acts just as that, a buffer to smooth out traffic spikes. A buffer does this at the cost of latency. If a buffer is large AND consistently full, that means that network link is always being fully utilized to where a large buffer isn't needed which basically induces large latency on top of waiting for the link to clear for no benefits (the extra latency *may* confuse administrators is basically the "danger"). On the other hand, if the link is under utilized the majority of times, the a large buffer is beneficial to deal with spike traffic. The majority of networks are the latter and hence designed as such. Two solutions, get faster links or deal with it more intelligently.

A lot more to interoperate with by tepples · 2011-12-02 15:51 · Score: 5, Insightful

A replacement for PATA or PCI has to interoperate only with other components in the same chassis, or possibly on the same desk in the case of eSATA and Thunderbolt. A replacement for TCP would have to interoperate with every other computer in the world. Imagine what a flag day that would be.

Re:SPAM by phayes · 2011-12-02 15:58 · Score: 4, Insightful

And just maybe some of us are interested in how research has progressed since the last article...

--
Democracy is a sheep and two wolves deciding what to have for lunch. Freedom is a well armed sheep contesting the issue

Re:Is this a problem? by ObsessiveMathsFreak · 2011-12-02 16:37 · Score: 5, Interesting

What we need is a ferry analogy.

Packet transmission is like a ferry, crossing a river at fixed intervals. But ferry sets off when it is full rather than at set times.

People wait at the shore and generally don't have to wait too long as the ferry is pretty fast and only needs a few people to fill up. For most people, walking onto the ferry involves very little waiting before the ferry actually departs and crosses the river.

Buffer bloat is when big buffers act like ferrys with huge capacity. People enter a huge 2000 passenger capacity boat, and are let on by their hundreds with seemingly no delay. But the ferry will not depart until it is reasonably full. So the people who got on first may have to wait for hours before the ferry actually departs and crosses the river.

It is clear that bigger ferries are no substitute for more ferries....or smaller rivers. Or possibly a bridge. In any case, you can get away without introducing cars or airplanes, so my job is done here.

--
May the Maths Be with you!

I've definitely noticed it on my DSL by Just+Brew+It! · 2011-12-02 17:06 · Score: 4, Interesting

As soon as I start trying to shove (or suck) more bits through the pipe than it can handle, round trip latency to "nearby" points of the Internet increases from ~25 ms to ~1 second. When I need to transfer a lot of data, I use rsync or wget if at all possible, and throttle the transfer to just below the rate the connection can handle; this results in ping times staying sane while only slowing down the transfer slightly. We shouldn't need to resort to doing stuff like this to make the network function properly!

article's analogy is like by OrangeTide · 2011-12-02 18:01 · Score: 4, Funny

This analogy is like a bathtub, full of spiders, and on fire. It sounds dangerous, but it's self limiting.

--
“Common sense is not so common.” — Voltaire

Buffer bloat animation by Twinbee · 2011-12-02 18:11 · Score: 5, Informative

I thought this animation by Richard Scheffenegger was a good way to show what's happening: http://www.skytopia.com/project/articles/lag/nam00000.avi Here's a description of the video:

The bad Bufferbloat setup is on the left (yellow dots), and the 'good' setup (i.e. how things used to be configured about 10-20 years ago when RAM was more expensive!) is on the right (cyan/blue dots).

Both sides start off okay, but notice how the left side 'queues' (tall yellow dot columns) keep on growing over time, while the right side blue columns stop short because of the small buffer size. As they stop short, some data 'packets' must be dropped, and this gets reported back to the upload site that it's shoving data to the user too fast. As a result, the upload site temporarily slows the sending of data, and thus the system self-corrects.

Meanwhile, on the left side, these packets of data never get dropped, so the giant bloated yellow buffers get filled more and more, but the computer at the upload site doesn't realise the carnage of these giant queues further down the line, and instead thinks "All is okay, let's keep sending data fast!".

Finally, when a smaller piece of data needs to be sent to the user (see 2:30+ signified by red dots on the left and dark blue dots on the right), the left side shows the red dots (which could be say, a small email) wading through giant queues to reach their destination, really slowly. Furthermore these tiny bits of data often need special 'emergency' treatment as they hold up other larger data associated with it. On the good right side, the dark blue dots have no such giant queues.

--
Why OpalCalc is the best Windows calc

Re:Is this a problem? by skids · 2011-12-02 18:20 · Score: 4, Informative

That analogy doesn't quite do the trick. TCP windowing is a bit more sophisticated than that. You can think of it maybe as a commander sending couriers out to support a mobile squad through hostile territory. If too many of them never make it to the squad, or back, he sends them less frequently so they can sneak through more discretely. If the troops make it through then he sends them faster because the more ammo he can get through the better. But he also has to decide how many men to put on courier duty. If the couriers take too long the squad has obviously moved further away from the base, and if he waited for the next one to return, he wouldn't be sending enough ammo. If the couriers return quickly, he can make do with less couriers.

Big buffers are like a flimsy rope bridge in the courier's path that takes a long time to cross. Couriers have to wait on one side because only one can cross at once, but the large groups waiting at the side of the cliff is more likely to get attacked. Until they do get attacked, however, the commander starts to think the squad has moved very far away, so he puts more couriers on duty. Since he thinks the squad is far away, he is not expecting them to return for a longer amount of time, it takes him longer to realize that they are starting to go missing entirely.

One of the best solutions to this problem turns out to be for some of the couriers to randomly go AWOL, and for more of them to go AWOL the bigger the crowd at the rope bridge gets. This basic concept is called Random Early Discard, and there have been a lot of ways invented for deciding who goes AWOL and why. If some of the couriers go AWOL, the commander thinks they are being attacked, so he slows down and also takes some troops off courier duty.

--
Someone had to do it.

Lag-o-Meter-of-Internet-Doom by WaffleMonster · 2011-12-02 18:49 · Score: 4, Interesting

If you look at buffers allocated to fast multi-gigabit interfaces at the core of the network they are simply not large enough compared to forwarding rates involved to be able to induce the kinds of delays needed to cause Internet wide problems.

You can argue they may not be ideal for real time voice, game or video communication when these links are oversubscribed but no doomsday is possible.

Today buffer bloat effects are mostly observed at the edge even though they need not always be.

Failure of a congestion control algorithm to control link saturation does not translate into congestive collapse of the larger network. It just results in *your* network connection turning to shit. When netalyzer runs it intentionally saturates your link at that time. In the real world only a few portions of the edge are ever saturated to the extent congestion control failure becomes an issue leading to more packets through core routers. The number of edge machines in this category would need to be significant to cause a rerun of previous issues.

That condition can not be met due to self feedbacks. If everyone maxed their pipes at once the core would saturate self-limiting edge saturation due to gross over-provisioning of available edge bandwidth in relation to core bandwidth which would ensure congestion control algorithms function properly.

I'm not arguing there is not a problem or more can't be done. I'm just arguing the doomsday congestive collapse scenario is bullshit.

Doing it wrong, again by Animats · 2011-12-02 19:56 · Score: 4, Interesting

That's a pretty simplified way of putting it, but basically correct. Major equipment vendors have been slow to adopt more advanced queuing strategies (Stochastic Fair Queuing integrated with some of the more advanced flavors of early discard.)

Right. The problem is not big buffers, per se. It's big dumb FIFO queues. There's nothing wrong with one big flow, like a file transfer, having a long latency, provided that other flows with less data in flight aren't stuck behind it. That's what "fair queuing" is all about. Each flow has its own queue, and the queues are serviced in a round-robin fashion. (With stochastic fair queuing, some hashing is done to eliminate some of the bookkeeping on flows, but the effect is roughly the same.)

I figured this out in the early 1980s (see RFC 970) and by the late 1990s, it was an established technology. We shouldn't be having this problem at this late date.

I wonder how much of the trouble comes from devices that are doing TCP-level processing in the middle of the network. Stateful firewalls and ISP ad-insertion engines can introduce substantial latency.

If you want to test for bad behavior, try running two flows, one that never has more than one packet outstanding, and one that just does a big file-transfer like operation like a download. If the latency of the low-traffic flow goes up to the same as that of the bulk flow, there's a big dumb buffer in the middle. If the packet loss rate of the low-traffic flow goes up, there's a small dumb buffer in the middle.

Re:Is this a problem? by syousef · 2011-12-02 22:12 · Score: 5, Funny

That is actually the exact problem. You do not want buffers larger than the flight time of your circuit. You absolutely want the buffers to fill and drop packets otherwise.

You talkin' smack, fool? I will end you! I bloat like a buffer, sting like a TCP!

--
These posts express my own personal views, not those of my employer

Re:How can I improve my own connection? by Chirs · 2011-12-03 08:58 · Score: 4, Interesting

As an end-user there are only a few things you can do:

1) Reduce the outgoing tcp queue size.
2) Reduce the tx ring buffer size in the network device driver
3) Set your router's upstream quality-of-service settings to throttle your upstream data transmission rate to just less than your upstream bandwidth.

Alternately, if you only have one heavy user of upstream bandwidth you could do something like what is described at "http://wanners.net:8000/blog/2011/05/zapping-upload-bufferbloat-with-one-command/". Basically throttling the upstream bandwidth directly on the machine in question rather than on the router.

Slashdot Mirror

Bufferbloat: Dark Buffers In the Internet

26 of 124 comments (clear)