Controlling Bufferbloat With Queue Delay

← Back to Stories (view on slashdot.org)

Controlling Bufferbloat With Queue Delay

Posted by Soulskill on Tuesday May 8, 2012 @04:19PM from the more-effective-than-harsh-language dept.

CowboyRobot writes "We all can see that the Internet is getting slower. According to researchers, the cause is persistently full buffers, and the problem is only made worse by the increasing availability of cheap memory, which is then immediately filled with buffered data. The metaphor is grocery store checkout lines: a cramped system where one individual task can block many other tasks waiting in line. But you can avoid the worst problems by having someone actively managing the checkout queues, and this is the solution for bufferbloat as well: AQM (Active Queue Management). However, AQM (and the metaphor) break down in the modern age when Queues are long and implementation is not quite so straightforward. Kathleen Nichols at Pollere and Van Jacobson at Parc have a new solution that they call CoDel (Controlled Delay), which has several features that distinguish it from other AQM systems. 'A modern AQM is just one piece of the solution to bufferbloat. Concatenated queues are common in packet communications with the bottleneck queue often invisible to users and many network engineers. A full solution has to include raising awareness so that the relevant vendors are both empowered and given incentive to market devices with buffer management.'"

25 of 134 comments (clear)

Min score:

Reason:

Sort:

s/slower/laggier/ by diamondmagic · 2012-05-08 16:30 · Score: 5, Insightful

The Internet is not getting slower. It is becoming laggier. Comeon people, learn the difference.

--
Wonder what the public key field is for?
1. Re:s/slower/laggier/ by gstrickler · 2012-05-08 16:47 · Score: 4, Insightful
  
  And smaller buffers will help. Larger buffers do almost nothing to increase throughput, but they can increase latency. Having buffers isn't a problem. Having buffers that are too large is a problem.
  
  --
  make imaginary.friends COUNT=100 VISIBLE=false
2. Re:s/slower/laggier/ by Xtifr · 2012-05-08 16:51 · Score: 4, Informative
  
  Yup, and another error in TFS is:
  
  According to researchers, the cause is persistently full buffers.
  should be "a cause".
  Lame, misleading summaries is par for the course around here, though. But look on the bright side--it helps keep us on our toes, sorting sense from nonsense, and helps us spot the real idiots in the crowd. :)
  At least this one had a link to a fairly reliable source. It wasn't just blog-spam to promote some idiot's misinterpretation of the facts. Might have been nice to also provide a link to bufferbloat.net or Wikipedia on bufferbloat, as well, for background information, but what can you do?
3. Re:s/slower/laggier/ by DarkOx · 2012-05-09 00:18 · Score: 2
  
  Depends...
  Suppose you have a router has link A connected at 10Mbs, link B at 10Mbs, and link C at 300Kbps. You have a host on the far end of A sending packets to something on the far end of C. The traffic is highly bursty. TCP does reliability end to end, so if the host on the end of C misses packets because the router discarded them that is all traffic that has to run across link A again, which cuts down the available bandwidth for A to B. If the router had a large buffer the burst of traffic from A for C might have been stored, preventing the retransmit on A. This works for bursty traffic, obviously the buffer will never flush if the A to C flow is continuous.
  Buffering is still important. Its just not as simple now that the internet is less bursty. More transfers are large files, streaming media, etc, less push that e-mail message, or 5K webpage and done.
  
  --
  Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
Where's the incentive? by rogueippacket · 2012-05-08 16:43 · Score: 2

Today, there is no incentive for an ISP to consider spending money on this. For their private customers, they sell QoS, which guarantees their customers a better queuing method. Extremely profitable. For consumers, it makes sense to simply continue investing in infrastructure. Adding capacity from the street to the CO not only eliminates the issue, but also allows the ISP to provide better, more profitable services. In short, we will likely see better queuing methods integrated with future routers. The may be one of them, but only time will tell, and nobody will discard all of their equipment today to get it. The issue is just too minor while capacity remains cheap and QoS profitable.
1. Re:Where's the incentive? by Ungrounded+Lightning · 2012-05-08 19:10 · Score: 5, Informative
  
  Today, there is no incentive for an ISP to consider spending money on this. For their private customers, they sell QoS, which guarantees their customers a better queuing method. Extremely profitable. For consumers, it makes sense to simply continue investing in infrastructure.
  You appear to be confused about the issue. This is not about capacity and oversubscription. This is about a pathology of queueing.
  The packets leaving a router, once it has figured out where they go, are stored in a buffer, waiting their turn on the appropriate output interface. While there are a lot of details about the selection of which packet leaves when, you can ignore it and still understand this particular issue: Just assume they all wait in a single first-in-first-out queue and leave in the order they were processed.
  If the buffer is full when a new packet is routed, there's nothing to do but drop it (or perhaps some other packet previously queued - but something has to go). If there are more packets to go than bandwidth to carry them, they can't all go.
  TCP (the main protocol carrying high-volume data such as file transfers) attempts to fully utilize the bandwidth of the most congested hop on its path and divide it evenly among all the flows passing through it. It does this by speeding up until packets drop, then slowing down and ramping up again - and doing it in a way that is systematic so all the TCP links end up with a fair share. (Packet drop was the only congestion signal available when TCP was defined.)
  So the result is that the traffic going out router interfaces tends to increase until packets occasionally drop. This keeps the pipes fully utilized. But if buffer overflow is the only way packets are dropped, it also keeps the buffers full.
  A full buffer means a long line, and a long delay between the time a packet is routed and the time it leaves the router. Adding more memory to the output buffer just INCREASES the delay. So it HURTS rather than helping.
  The current approach to fixing this is Van Jacobson's previous work: RED (Random Early Drop/Discard). In addition to dropping packets when the buffer gets full, an very occasional randomly-chosen packet is dropped when the queue is getting long. The queue depth is averaged - using a rule related to typical round-trip times - and the random dropping increases with the depth. The result is that the TCP sessions are signalled early enough that they back off in time to keep the queue short while still keeping the output pipe full.The random selection of packets to drop means TCP sessions are signalled in proportion to their bandwidth and all back off equally, preserving fairness. The individual flows don't have any more packets drop on the average - they just get signalled a little sooner. Running the buffers nearly empty rather than nearly full cuts round-trip time and leaves the bulk of the buffers available to forward - rather than drop - sudden bursts of traffic.
  ISPs have a VERY LARGE incentive to do this. Nearly-full queues increase turnaround time of interactive sessions, creating the impression of slowness, and dropping bursty traffic creates the impression of flakeyness. This is very visible to customers and doing it poorly leaves the ISP at a serious competitive disadvantage to a competitor that does it well.
  So ISPs require the manufacturers of their equipment to have this feature. Believe me, I know about this: Much of the last 1 1/2 years at my latest job involved implementing a hardware coprocessor to perform the Van Jacobson RED processing in a packet processor chip, to free the sea of RISC cores from doing this work in firmware and save their instructions for other work on the packets.
  
  --
  Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
2. Re:Where's the incentive? by HighBit · 2012-05-09 02:00 · Score: 2
  
  You appear to be confused about the issue. This is not about capacity and oversubscription. This is about a pathology of queueing.
  To be fair, it's about both.
  Large queues are a problem, but they can be mitigated by adding more capacity (bandwidth). It doesn't matter how deep the queue can be if it's never used -- it doesn't matter how many packets can be queued if there's enough bandwidth to push every packet out as soon as it's put in the queue.
  That said, your point about AQM being a valid solution to congestion is, of course, right on:
  To avoid large (tens of milliseconds or more) queue backlogs on congested links, you use Active Queue Management. The idea with AQM is, if you have to queue packets (because you don't have enough bandwidth to push everything out in under 10 or 20 milliseconds), then start dropping packets (or ECN-marking them), so TCP's congestion control algorithms kick in.
  Dropping packets before they get put in the queue is known as tail-drop AQM. Tail-drop AQM is actually one of the worst ways to do AQM. RED (marking or dropping packets *before* the queue becomes full) and head-drop AQM are better for latency and throughput. However, even a simple tail-drop AQM can *drastically* reduce latency on an oversubscribed (congested) link. AQM really works, and it works quite well.
  
  TCP attempts to divide traffic for different streams evenly among all the flows passing through it.
  Well, no, it doesn't. Each stream tries to fight for its own bandwidth, backing off when it notices congestion (dropped or ECN-marked packets). That means that the first stream that is going over the congested link will use the bulk of the bandwidth, because it will already be transmitting at full speed before other streams try to ramp up. The other streams won't be able to ramp up to match the first stream, as they will constantly encounter congestion, and the first stream won't back off enough to let other streams ramp up to match it. To truly enforce fairness between streams, you need AQM technologies, such as SFQ.
  
  ISPs have a VERY LARGE incentive to do this.
  ISPs certainly use AQM on their core routers, but they have an incentive NOT to use AQM where it really matters: on the congested link between your computer and the gateway. In other words, they don't set up proper AQM on the cable modem or DSL modem.
  They don't set up AQM there because they have another incentive: maximizing speed-test results. AQM by definition slows traffic down, and slower speed-test results are what customers seem to care about above all else. People don't call support to say they're seeing over 100ms of latency, they call support saying they're paying for 10mbits and they want to see 10mbits on the speed-test site.
  I don't have any faith that ISPs are going to fix this any time soon. However, AQM really does make a huge different in the quality of one's internet connection. So much so that the first thing I do when setting up any new shared network (e.g. home or office network) is put a Linux box in between the cable/DSL modem and the rest of the network. There are many AQM scripts out there, but this one is mine: http://serverfault.com/questions/258684/automatically-throttle-network-bandwidth-for-users-causing-bulk-traffic/277868#277868
  My script sets up HFSC and SFQ, as well as an ingress filter, to drop packets before they start filling up the large cable/DSL modem buffers. It does a bang-up job of reducing latency; I can hardly internet without AQM in place any more.
  You can do the same thing (or at least a similar thing) with some of the SoHo Linux routers running DD-WRT and the like. Most of the scripts for those focus on QoS first and AQM second (if at all), which is a huge mistake. Maybe someday we'll have off-the-shelf SoHo routers that can do *proper* AQM. Now there's a start-up idea if I ever had one.
Remember AT&T and their 9 second 3G ping times by Zondar · 2012-05-08 16:58 · Score: 4, Interesting

Yep, same cause. They attempted to minimize packet loss by increasing the buffers in their network. The user experience was horrible.
http://blogs.broughturner.com/2009/10/is-att-wireless-data-congestion-selfinflicted.html
And why... by Anonymous Coward · 2012-05-08 17:01 · Score: 5, Interesting

Is the internet getting slower? (laggier)
because the simplest pages are HUGE BLOATED MONSTROSITIES!
Between flash and ads. And every single page loading crap from all around the world as their 'ad partners', hit counters, click counters, +1 this, like this, digg this, and all the other stupid social media crap that has invaded the web. All this shit that serves no purpose other than to some marketers. And EVERY SINGLE PAGE has to have a 'comment' section and other totally useless shit tacked on as well.
Just this little page here on slashdot. With less than a dozen replies. Tops 80k so far. And that's with everything being blocked that can be.
slower? laggier? no... the signal to noise raito is sucking major ass.
1. Re:And why... by hvm2hvm · 2012-05-08 20:58 · Score: 2
  
  The total size is not the only thing that matters. What matters is the fact that most pages make requests to as many as 10 domains and 50 URLs when they load. That means multiple DNS requests, multiple connections, etc. There also a lot of pages that load stuff through javascript and/or css which adds another stage or two of loading.
  
  --
  ics
2. Re:And why... by Anonymous Coward · 2012-05-08 22:33 · Score: 2, Informative
  
  What matters is not (if we put privacy aside) that 10 domains are requested, but that there are 10 (mostly) different routes to 10 different servers. If a single of these routes or servers are slow, the website loaded will load slow as well.
Summary so awful, it just hurts. by TiggertheMad · 2012-05-08 17:18 · Score: 3, Insightful

We all can see that the Internet is getting slower.

Can we? It looks like it has been getting faster to me....

According to researchers, the cause is persistently full buffers,

What researchers? What buffers? Server buffers? Router buffers? Local browser buffers? Your statements are so vague as to be useless.

and the problem is only made worse by the increasing availability of cheap memory, which is then immediately filled with buffered data.

Buffering is a way of speeding servers up immensely. Memory is orders of magnitude faster than disk, and piling RAM on and creating huge caches can only help performance. I call bullshit on your entire claim. This summary is so awful, I don't even want to read whatever article you linked to.

--

HA! I just wasted some of your bandwidth with a frivolous sig!
1. Re:Summary so awful, it just hurts. by Xtifr · 2012-05-08 17:36 · Score: 5, Informative
  
  It is definitely a terrible summary, but the ACM article it links to is actually quite interesting. (You do know what the ACM is, don't you?) And bufferbloat has nothing to do with discs, so your objection is completely off base. It certainly would have helped if the summary had given you any idea what bufferbloat is, of course, so I understand your confusion. But it's a real thing. The problem is that the design of TCP/IP includes built-in error correction and speed adjustments. Large buffers hide bottlenecks, making TCP/IP overcorrect wildly back and forth, resulting in bursty, laggy transmission.
2. Re:Summary so awful, it just hurts. by Imagix · 2012-05-08 17:38 · Score: 2
  
  Buffering is a way of speeding servers up immensely. Memory is orders of magnitude faster than disk, and piling RAM on and creating huge caches can only help performance.
  You're thinking of caching, not buffering.
Numbers & market incentives by Logic+and+Reason · 2012-05-08 17:23 · Score: 5, Interesting

We all can see that the Internet is getting slower.
Can we? I'd suggest that most people are unaware of any such trend, perhaps because it has happened too gradually and too unevenly. Indeed:

A full solution has to include raising awareness so that the relevant vendors are both empowered and given incentive to market devices with buffer management.
Exactly. Consumers don't know or care about low latency, so the market doesn't deliver it (that plus lack of competition among ISPs in general, but that's another kettle of fish).

We need a simple, clear way for ISPs to measure latency. It needs to boil down to a single number that ISPs can report alongside bandwidth and that non-techies can easily understand. It doesn't need to be completely accurate, and can't be: ISPs will exaggerate just like they do with bandwidth, just like auto manufacturers do with fuel efficiency, etc. What matters is that ISPs can't outright make up numbers, so that a so-called "40 ms" connection will reliably have lower average latency than a "50 ms" connection. That should be enough for the market to start putting competitive pressure on ISPs.

What kind of measure could be used for this purpose? Perhaps some kind of standardized latency test suite, like what the Acid tests were to web standards compliance? Certainly there would be significant additional difficulties, but could it be done?
Their problem setup is a speed boundary transition by tlambert · 2012-05-08 17:33 · Score: 3, Interesting

The boundary they are transiting is one between a fast network and a slower network, similar to what you see at a head-end at a LATA or broadband distribution point and leaf nodes like peoples houses, or one the other end, on the pipe into a NOC with multi gigabit interconnects much bigger than the pipes into or out of the NOC.
The obvious answer is the same as it was in 1997 when it was first implemented on the Whistle InterJet: lie about the available window size on the slow end of the link so as to keep the fast end of the link from becoming congested by having all its buffers filled up with competing traffic.
In this way, even if you have tasks which would otherwise eat all of your bandwidth (at the time, it was mostly FTP and SMTP traffic), you can still set aside enough buffer space on the fast side of the router on the other end of the slow link to let ssh or HTTP traffic streams make it through. Beats the heck out of things like AltQ, which do absolutely nothing to prevent a system with a fast link that has data to send you crap-flooding the upstream router so that it has no free buffers to receive any other traffic, and which it can't possibly hope to shove down the smaller pipe at the rate it's coming in the large one.
Ideally this would be cooperatively managed, as was suggested at one point by Jeff Mogul (which is likely barred due to the lack of a trust relationship between the upstream and downstream routers, if nothing else). Think of it like half your router lives at each end of the link wire, instead of both sides living on one end.
It's the job of the device on the border who happens to know there's a pipe size differential to control what it asks for from the upstream side int terms of the outstanding buffer space it's possible for inbound packets to consume (and to likewise lie about the upstream windows to the downstream higher speed LAN on the other end of the slow link).
I'm pretty sure Julian Elischer tried to push the patches for lying about window size out to FreeBSD as oart of Whistle contributing Netgraph to FreeBSD.
While people are looking at that, they might also want to reconsider the TCP Rate Halving work at CMU, and the LRP implementation from Peter Druschel's group out of Rice University.
-- Terry
Re:Remember AT&T and their 9 second 3G ping ti by Crypto+Gnome · 2012-05-08 17:34 · Score: 2
A long time ago when the earth was greener someone promoted the concept of an internet with ZERO packet loss.

My InterTubes are BETTER because I HAVE ZERO LOSS!!!

Oddly enough such a business model turned out to be unsustainable due to
(1) it's finanically expensive (between one thing an another)
(2) doing this the less expensive way (ie by slathering on bigger buffers) introduces excessive latency (for some customer designated value of "excessive")

For the life of me I don't understand how ANYBODY can be allowed to run a company without at least vaguely understanding the concept TANSTAAFL.
- You cannot change the laws of physics
- Perpetual Motion never is
- There is no Miracle Cure
- There is no Get Rich Quick Scheme
And, finally: No that Hot Blonde Supermodel with MASSIVE Bazingas does NOT find you attractive, not in the slightest, no matter how much she may drink/snort/inject or pop.

If you want to fix throughput issues without spending lots of money or sacrificing latency then you're going to need a beter algorithm (yes folks, hard research and careful thinking).
--
Visit CryptoGnome in his home.
Re:Maybe by Crypto+Gnome · 2012-05-08 17:37 · Score: 3, Funny

Slashdot already has the equivalent: ACM (Anonymous Coward Management).

They also have ATF (Assisted Troll Flagging) as a kind of belt-n-suspenders thing.

--
Visit CryptoGnome in his home.
Paper is ambiguous about what gets dropped by Animats · 2012-05-08 17:47 · Score: 3, Insightful

It's not clear from the paper whether packet dropping is per-flow, in some fair sense, or per link. There's a brief mention of fairness, but it isn't explored. It sounds like the new approach has no built-in fair queuing.
Without fair queuing, whoever sends the most gets the most data through. Windows (especially) starts up TCP connections by sending as many packets as it can at connection opening. There used to be a convention in TCP called "slow start", where new connections started up sending only two packets, increasing if the round trip time turned out to be good. That was too pessimistic. But Windows now starts out by blasting out 25 or so packets at once. This hogs the available bandwidth through everything with FIFO queues.
If the routers at choke points (where bandwidth out is much less than bandwidth in, like the entry to a DSL line) do fair queuing by flow, the problem gets dealt with there, as the excessive sending fights with itself, trailing packets on the biggest flows are sent last, and everything works out OK.
"Bufferbloat" is only a problem when a small flow gets stuck behind a big one. A flow getting stuck behind the preceding packets of the same flow is fine; you want those packets delivered. Packet reordering is better than packet dropping, although more computationally expensive. Most CIsco routers offer it on slower links. Currently, this means links below 2Mb/s, which is very slow by modern standards. That's why we still have kludgy solutions like RED. This new thing is a better kludge, though.
1. Re:Paper is ambiguous about what gets dropped by jg · 2012-05-09 01:04 · Score: 2
  
  The article's subtitle is: "A modern AQM is just one piece of the solution to bufferbloat." We certainly expect to be doing fair queuing and classification in addition to AQM in the edge of the network (e.g. your laptop, home router and broadband gear). I don't expect fair queuing to be necessary in the "core" of the network.
  I'll also say that an adaptive AQM is an *essential* piece of the solution to bufferbloat, and a piece we've had no good solution to (until, we think, now).
  That's why this article represents "fundamental progress".
My internets fine by rhade · 2012-05-08 18:32 · Score: 4, Insightful

We all can see that the Internet is getting slower *Citation needed* Have you tried turning your modem off and on again?

--
http://www.awfullybigmoustache.com
You know as a species you're doing it wrong when by clickclickdrone · 2012-05-08 19:11 · Score: 3, Funny

My first thought after reading the story was 'Hope whoever patents those ideas doesn't charge too much for them."

--
I want a list of atrocities done in your name - Recoil
Re:Remember AT&T and their 9 second 3G ping ti by SuricouRaven · 2012-05-08 20:31 · Score: 2

Correction: There is no get-rich-quick scheme with a high probability of success. There are a few (like the lottery) which may get you rich quick, but with only a small probability.
Re:Active Queue Management by arth1 · 2012-05-08 23:17 · Score: 2

Mathematically optimal too, providing all customers/packets take equal time to process. The only problem is that in the real world it requires awkward physical queue layouts.
Yeah, it is irritating when you arrive at the airport at five o'god in the morning to catch a flight leaving at nine, and as the only customer, you have to walk a labyrinth a mile long to get to the way too cheerful check-in assistant who is patiently waiting.
Net result: Added latency.
Then you hit the security line, which is already full, presumably by people who spontaneously spawned from the walls between check-in and security. And in the security line, this one line approach does not appear to help speed at all.
Re:Their problem setup is a speed boundary transit by m.dillon · 2012-05-09 05:32 · Score: 2

It's a part of the solution but not a complete solution. Boundary problems are different from center-of-network problems. Playing with TCP window sizes only works well at the edges and only in the outgoing direction (the 'inflight' sysctls that we've had forever), but does not completely solve the problem because you always need to add one or two additional packets above and beyond the calculated sweet spot to absorb changes in latency from other links and give the algorithm time to respond whenever reality changes.
This solution in both incoming and outgoing directions suffers from a connection multiplication problem. That is, it works fine if you have only a few simultaneous TCP connections running but it breaks down when you have dozens or hundreds due to the need to have 1-2 extra packets of slop in the reported window. It degrades gracefully in the outgoing direction but blows up very quickly when you are trying to control bandwidth in the incoming direction by changing the window size you report available in the outgoing ACKs (anyone running torrents can tell you this but the problem occurs for any busy network).
However, bandwidth limiting in this fashion DOES reduce packet backlogs and queues significantly. Not enough (it's simply impossible to make the algorithm stable at the sweet spot so you always need 1-2 additional packets), but significantly. Hence it is part of the solution.
Step 2: On speed boundary changes you have to run fair queuing, period. Not only that but you have to do it on both sides of the boundary (i.e. in both directions each at the choke point). In my case I could never get truly reliable operation by only running fair queuing in the outgoing direction. I had to actually run the fair queue on *both* ends of the link, meaning I had to colocate a server to serve as the terminus for a VPN and run all the traffic over the VPN so I could control both ends. The fair queue also has to reserve bandwidth for pure TCP ACKs to prevent restricting the bandwidth in the opposite direction due to ACK starvation.
Fair queuing takes care of all remaining packet buffering issues at the edges of the network.
Center-of-network issues can't use the above solutions simply because there are too many connections flowing through the center of the network to track and not enough packet buffer space to sufficiently buffers all those connections for fairq operation (N x B is just too big). It's impossible to calculate where the choke points are based on connection tracking at the center-of-network. Easy at the edges, impossible in the center.
So the center-of-network has to provide additional congestion control through some sort of AQM, and if early-warning requests go unheeded it must start dropping packets. Personally speaking I hate the idea of having to drop packets, it leads to all sorts of problems everywhere, but the simple fact of the matter is that there is not enough per-connection buffer space at the center of the network to run fairq, nor is it an appropriate place for that. Without tracking you cannot mess with tcp window sizes in the center-of-network and even if you could track you can't bundle the connections based on where the choke point is at the moment, there is simply not enough information.
So we are talking at least three and probably closer to half a dozen different mechanisms being needed to reduce network latencies and still provide good and fair performance.
-Matt