bufferbloat only occurs at the slowest bottleneck link. This is usually the phone on the uplink and the e-node-b or vpn backhaul on the downlink. It can certainly happen elsewhere in the network but that's the vast majority of cases.
8 years of R&D and having fixed wifi thoroughly with work shipping in both linux and OSX? BQL+ RFC8290/fq _codel for ethernet , SQM/or sch_cake for linux on cable, dsl & fiber, fq_codel + ATF for wifi:
https://www.usenix.org/system/...
In wifi, now, at least, uplinks are easily controlled at the user device (phone), which is what I was mostly measuring. Same principles apply to lte and 5g.
In lte, for downlinks, you need help at the enode b and backhaul. Those suck too, but "only" to about 600ms max latency under load.
https://www.ericsson.com/en/er...
It's just intolerable on LTE and 5G. This was a test I ran today of bloat on tethered android cell phone - 2+ seconds of observed delay: http://www.dslreports.com/spee...
Early tests of 5G are equally dismal, with over 1.5 seconds of observed latency under load.
As osx adopted fq_codel (RFC8290) last year for their wifi drivers, awareness of this problem has finally made it to at least some of Apple's upper management. Here's hoping it's made it to the lte folk there also!
my wifi lab is located deep in the los gatos hills, in a naturist colony and 110 acre campground called lupin lodge.
Luckily I was not fiddling with new frequencies and stuff that required FCC approval, because, well... suits aren't welcome here, and there's essentially no dress code. Wifi's pretty good, though...
( https://www.usenix.org/system/... )
If only I could get every slashdotter to take an hour out from flaming and look over the mill architecture diagram:
http://millcomputing.com/wiki/...
Or burn an hour grokking some part of it they might want to understand (ivan is a trip to watch)
https://millcomputing.com/docs...
It would be a better world. The Mill folk think way out of the box.
I like to think all the work we've done in fixing bufferbloat all over the edge will make interactive gaming more popular and pleasant. It would be nice if some gaming CEO acknowledged the benefits of sqm and rfc8290!
Less bufferbloat will also make streaming games more feasible - IMHO, the biggest reason onlive failed was due to the widely variable latencies they encountered while trying to shove that much data down the pipe.
I agree strongly however with those that think good interactive gaming requires very low ms latencies, but there's room for something between farmville and call of duty here.
If only a reboot solved all problems! Can't they also suggest reflashing with something immune to this malware like any of the third party router firmwares?
On my bad days, watching over the cyberwarfare, and now that the domain has been seized, I can imagine the FBI P0wning your router, rather than the original authors - because now they have the capability to do so.
Reboot and reflash., damn it.
A fabulous, deep, funny book on rocket fuels and the crazed chemists that developed them is called "Ignition!", by John D. Clark and forward by Isaac Asimov. Example text:
"Recommended lab attire for working with this volatile compound: Running shoes."
Ultimately this has potential to recreate "sex sessions of the stars". How long will it be before everyone is able to record and playback and relive those intimate moments? "Sex tapes" will become passe', when compared to fully interactive virtual experiences being passed around, or sold, by the companies that sold you the basic hardware to play with, years after you first played with it. Maybe standard protocols for sexual interaction will emerge, like MIDI values for every basic position in the kama sutra... defined values for motion controllers, and the like.
Please note that I am neither praising or condemning this future, I just happen to think that if any technology can be used during sex, it will, and thus I tend to think that discounted interactive sex toys that do broadcast your activities will be cheaper than those that don't - much like you can get a kindle without advertisements for more money. My nightmare, I guess, is that you'll end up seeing ads projected over the body parts of your virtual lovers if you don't pay the tax.
Recently I made a trip to nicaragua and deployed a few fq_codel and cake equipped routers there. The effects were *marvellous*. The kind of total failure to load problems dan describes, in particular, went away.
Still, I'd like to banish those that think IW of greater than 2, TSO and GSO, and 6 simultaneous connections in web browsers - as a good idea - to remote locations for a while, until they learn the errors of their ways.
Dear Damon:
I'm sorry, I tuned out of slashdot after a day.
"I am still baffled from an afternoon's reading round the subject if to be effecitive your anti-BB magic has to happen at (nearly) every edge device, or (nearly) every lossy (or speed-mismatched) network gap, or if BB can be fixed by judicious ISP infrastructure deployment, or would cumulatively benefit if multiple of those happened."
Better queue management everywhere would be good. Your second thought is closest to correct:
"(nearly) every lossy (or speed-mismatched) network gap" needs better queue management. That's a LOT (billions) of devices. The thing is, the queue management problem was known well before 1992, it's just that RED did not deploy very well, and FQ techniques were often kept as secret sauce. Things got out of hand as speeds went up and the potential speed mismatch variance between links went to 6 orders of magnitude, since 1992.
I (we) fully realize that the scope (billions of machines/year) of sticking solutions everywhere is hard, but it is never too late to start, (b and replacing dumb overbuffered fifos everywere with a couple hundred lines of code - considering the millions elsewhere seems simple!) and we've pursued developing an easily deployable solution (fq_codel primarily), as well as standardization efforts (ietf aqm working group). Things like systemd default to fq_codel, so do most third party linux router firmwares.
About the only major thing left (since fixing wifi) is actually getting this stuff into hardware and "big iron" like cmtss and BRASes. ...
On devices themselves, we've worked on ripping out excessive buffering throughout the stack (BQL, things like TCP_NOTSENT_LOWAT (now the default in OSX), most recently pacing via the sch_fq qdisc and "TCP BBR") so that the tcp's and applications (mostly linux, but increasingly BSD), are not storing crazy amounts of data internally. There's been a lot of other changes, all my talks include a slide on the higher levels of stack and application issues. ...
IF you have enough capacity, you don't see BB (except for microbursts, which couldbe quite bad before we started moderating TSO/GSO/GRO bursts). Certainly the core and a well designed infrastructure that never saturates removes the issue (except when it does happen!) ...
At some point I need to sit down and write something definitive, instead of this vapor trail of 6 years worth of work all over all the gear and all of the stack(s).
--
Dave TÃht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org/
Many enterprise APs are pretty good, btw - and while I have not tested the current crop of stuff from eero, and google and so on, I'm pretty sure they've been paying attention to the work. (portions of the make-wifi-fast project were funded both by google and comcast research)
So I hope you've been making your stuff great in the first place, and not having to deal with paying off all the technical debt we've been paying off here:
https://docs.google.com/docume...
But please go test for the things we are testing for and fixing!
These are "latency under load" measurements (using the dslreports and flent tools to stress out your link). If your network is otherwise idle voip is fine, but with people adding ever more devices to their network doing random things at random times, the bloat problem raises its ugly head.
(and yes, voip is frequently unusable when your ISP link is under stress from something else without out these queue management techniques in place there)
I tried to stress in the lwn article that first eliminating bloat from the ISP link will make your wifi a lot better, because wifi is usually not the bottleneck in many scenarios. But: the wifi work we just pushed upstream makes voip far more possible when wifi is contended.
Is it limited to linux? No - it seems to be deeply affecting the current crop of gateways supplied by ISPs, as well as nearly the entire 3rd party router market, except those who are deploying qos sanely (which is nearly everybody these days in third party firmware - "fq_codel" lies underneath many a rebranded qos system nowadays. "cake" is a possible successor.
The frustrating part is that wifi folk are often saying their stuff is fine, when it can be so deeply affected by the next hop up, and also tends to become poor anytime a second or third device is stressing the link (transferring files to a NAS, for example, screensharing for another). 802.11ac devices tend to have more latency than 802.11n, also, because they tend to use a fixed buffer size suitable for their highest rates, and not something that adjusts to the actual rate.
If you are interested in poking into these issues further, on your equipment, take a look at flent, and/or come on over for the discussions on the make-wifi-fast mailing list.
Nice success story and the exact circumstances we were trying to make easier to solve with cake. (and the dream is more ISPs would just be doing it for you on their default supplied boxes)
I would like to benchmark more stuff like tomato's qos against cake, the equivalent (single!) command line for outbound would be:
tc qdisc add dev your_device root cake bandwidth 2mbit nat
which automatically applies per host fairness, qos, and queue length management.
inbound requires a slightly more complex setup but not much.
I am not huge on basic web tests, preferring the finer grained results we get from flent. (https://flent.org).
And I totally agree that the trendline is to ever more devices doing ever more stuff randomly when you least desire it. We need to have edge routers AND ISPs ready for this change in traffic patterns.
The article you cited was quite good, although it missed completely the outputs of the ietf aqm working group, of which both I and fred baker are members. https://tools.ietf.org/wg/aqm/
It is entirely probable we've been inside our own filter bubble so long (6 years) we cannot properly communicate with first time readers!
some folk explaining the problem... the ietf video shows the benefit from fixing it.
https://www.bufferbloat.net/pr...
showing the extent:
http://www.dslreports.com/spee...
you have this entirely backwards:
"Buffering can reduce latency, especially under heavy load, by better bandwidth utilization, and allowing faster retransmission of dropped packets. If it is slowing things down, then you should fix the buffering rather than eliminating it."
You want enough buffering to absorb bursts, but any more just adds latency.
Van Jacobson and kathie nichols calls this distinction good queue and bad queue:
https://tools.ietf.org/html/dr...
Less buffering (and fair queuing) allows for faster retransmission in particular.
bufferbloat only occurs at the slowest bottleneck link. This is usually the phone on the uplink and the e-node-b or vpn backhaul on the downlink. It can certainly happen elsewhere in the network but that's the vast majority of cases.
8 years of R&D and having fixed wifi thoroughly with work shipping in both linux and OSX? BQL+ RFC8290/fq _codel for ethernet , SQM/or sch_cake for linux on cable, dsl & fiber, fq_codel + ATF for wifi: https://www.usenix.org/system/... In wifi, now, at least, uplinks are easily controlled at the user device (phone), which is what I was mostly measuring. Same principles apply to lte and 5g. In lte, for downlinks, you need help at the enode b and backhaul. Those suck too, but "only" to about 600ms max latency under load. https://www.ericsson.com/en/er...
It's just intolerable on LTE and 5G. This was a test I ran today of bloat on tethered android cell phone - 2+ seconds of observed delay: http://www.dslreports.com/spee... Early tests of 5G are equally dismal, with over 1.5 seconds of observed latency under load. As osx adopted fq_codel (RFC8290) last year for their wifi drivers, awareness of this problem has finally made it to at least some of Apple's upper management. Here's hoping it's made it to the lte folk there also!
my wifi lab is located deep in the los gatos hills, in a naturist colony and 110 acre campground called lupin lodge. Luckily I was not fiddling with new frequencies and stuff that required FCC approval, because, well... suits aren't welcome here, and there's essentially no dress code. Wifi's pretty good, though... ( https://www.usenix.org/system/... )
and his wife. Sad.
if we could only get LEDs to switch even 1/100th this fast. I think it's about 2ms now.
Just to start with: https://en.wikipedia.org/wiki/...
If only I could get every slashdotter to take an hour out from flaming and look over the mill architecture diagram: http://millcomputing.com/wiki/... Or burn an hour grokking some part of it they might want to understand (ivan is a trip to watch) https://millcomputing.com/docs... It would be a better world. The Mill folk think way out of the box.
I like to think all the work we've done in fixing bufferbloat all over the edge will make interactive gaming more popular and pleasant. It would be nice if some gaming CEO acknowledged the benefits of sqm and rfc8290! Less bufferbloat will also make streaming games more feasible - IMHO, the biggest reason onlive failed was due to the widely variable latencies they encountered while trying to shove that much data down the pipe. I agree strongly however with those that think good interactive gaming requires very low ms latencies, but there's room for something between farmville and call of duty here.
If only a reboot solved all problems! Can't they also suggest reflashing with something immune to this malware like any of the third party router firmwares? On my bad days, watching over the cyberwarfare, and now that the domain has been seized, I can imagine the FBI P0wning your router, rather than the original authors - because now they have the capability to do so. Reboot and reflash., damn it.
A fabulous, deep, funny book on rocket fuels and the crazed chemists that developed them is called "Ignition!", by John D. Clark and forward by Isaac Asimov. Example text:
"Recommended lab attire for working with this volatile compound: Running shoes."
Ignition! has been long out of print. Thankfully archive.org has a copy here: https://archive.org/details/ig...
This algorithm enables the little guy, and is eventually fair to all. https://tools.ietf.org/html/rf...
I wish I could find the relevant passage in Orwell's 1984. Or was it Brave New World?
Ultimately this has potential to recreate "sex sessions of the stars". How long will it be before everyone is able to record and playback and relive those intimate moments? "Sex tapes" will become passe', when compared to fully interactive virtual experiences being passed around, or sold, by the companies that sold you the basic hardware to play with, years after you first played with it. Maybe standard protocols for sexual interaction will emerge, like MIDI values for every basic position in the kama sutra... defined values for motion controllers, and the like. Please note that I am neither praising or condemning this future, I just happen to think that if any technology can be used during sex, it will, and thus I tend to think that discounted interactive sex toys that do broadcast your activities will be cheaper than those that don't - much like you can get a kindle without advertisements for more money. My nightmare, I guess, is that you'll end up seeing ads projected over the body parts of your virtual lovers if you don't pay the tax.
I'd have thought they'd have tried to build next door to CIA HQ or the NSA, or even co-located.
Bloat, yes, bufferbloat, no.
Recently I made a trip to nicaragua and deployed a few fq_codel and cake equipped routers there. The effects were *marvellous*. The kind of total failure to load problems dan describes, in particular, went away.
Still, I'd like to banish those that think IW of greater than 2, TSO and GSO, and 6 simultaneous connections in web browsers - as a good idea - to remote locations for a while, until they learn the errors of their ways.
https://tools.ietf.org/html/dr...
is also a good read.
Dear Damon:
...
On devices themselves, we've worked on ripping out excessive buffering throughout the stack (BQL, things like TCP_NOTSENT_LOWAT (now the default in OSX), most recently pacing via the sch_fq qdisc and "TCP BBR") so that the tcp's and applications (mostly linux, but increasingly BSD), are not storing crazy amounts of data internally. There's been a lot of other changes, all my talks include a slide on the higher levels of stack and application issues.
...
IF you have enough capacity, you don't see BB (except for microbursts, which couldbe quite bad before we started moderating TSO/GSO/GRO bursts). Certainly the core and a well designed infrastructure that never saturates removes the issue (except when it does happen!)
...
At some point I need to sit down and write something definitive, instead of this vapor trail of 6 years worth of work all over all the gear and all of the stack(s).
I'm sorry, I tuned out of slashdot after a day.
"I am still baffled from an afternoon's reading round the subject if to be effecitive your anti-BB magic has to happen at (nearly) every edge device, or (nearly) every lossy (or speed-mismatched) network gap, or if BB can be fixed by judicious ISP infrastructure deployment, or would cumulatively benefit if multiple of those happened."
Better queue management everywhere would be good. Your second thought is closest to correct:
"(nearly) every lossy (or speed-mismatched) network gap" needs better queue management. That's a LOT (billions) of devices. The thing is, the queue management problem was known well before 1992, it's just that RED did not deploy very well, and FQ techniques were often kept as secret sauce. Things got out of hand as speeds went up and the potential speed mismatch variance between links went to 6 orders of magnitude, since 1992.
I (we) fully realize that the scope (billions of machines/year) of sticking solutions everywhere is hard, but it is never too late to start, (b and replacing dumb overbuffered fifos everywere with a couple hundred lines of code - considering the millions elsewhere seems simple!) and we've pursued developing an easily deployable solution (fq_codel primarily), as well as standardization efforts (ietf aqm working group). Things like systemd default to fq_codel, so do most third party linux router firmwares.
About the only major thing left (since fixing wifi) is actually getting this stuff into hardware and "big iron" like cmtss and BRASes.
-- Dave TÃht Let's go make home routers and wifi faster! With better software! http://blog.cerowrt.org/
Many enterprise APs are pretty good, btw - and while I have not tested the current crop of stuff from eero, and google and so on, I'm pretty sure they've been paying attention to the work. (portions of the make-wifi-fast project were funded both by google and comcast research) So I hope you've been making your stuff great in the first place, and not having to deal with paying off all the technical debt we've been paying off here: https://docs.google.com/docume... But please go test for the things we are testing for and fixing!
I was a bit put off by the first 25 posts being basically trollish. I have tried to be helpful, merely, since.
These are "latency under load" measurements (using the dslreports and flent tools to stress out your link). If your network is otherwise idle voip is fine, but with people adding ever more devices to their network doing random things at random times, the bloat problem raises its ugly head.
(and yes, voip is frequently unusable when your ISP link is under stress from something else without out these queue management techniques in place there)
I tried to stress in the lwn article that first eliminating bloat from the ISP link will make your wifi a lot better, because wifi is usually not the bottleneck in many scenarios. But: the wifi work we just pushed upstream makes voip far more possible when wifi is contended.
Is it limited to linux? No - it seems to be deeply affecting the current crop of gateways supplied by ISPs, as well as nearly the entire 3rd party router market, except those who are deploying qos sanely (which is nearly everybody these days in third party firmware - "fq_codel" lies underneath many a rebranded qos system nowadays. "cake" is a possible successor.
The frustrating part is that wifi folk are often saying their stuff is fine, when it can be so deeply affected by the next hop up, and also tends to become poor anytime a second or third device is stressing the link (transferring files to a NAS, for example, screensharing for another). 802.11ac devices tend to have more latency than 802.11n, also, because they tend to use a fixed buffer size suitable for their highest rates, and not something that adjusts to the actual rate.
If you are interested in poking into these issues further, on your equipment, take a look at flent, and/or come on over for the discussions on the make-wifi-fast mailing list.
Nice success story and the exact circumstances we were trying to make easier to solve with cake. (and the dream is more ISPs would just be doing it for you on their default supplied boxes)
I would like to benchmark more stuff like tomato's qos against cake, the equivalent (single!) command line for outbound would be:
tc qdisc add dev your_device root cake bandwidth 2mbit nat
which automatically applies per host fairness, qos, and queue length management.
inbound requires a slightly more complex setup but not much.
cut through routing works when there is no congestion. http://www.dslreports.com/spee...
I am not huge on basic web tests, preferring the finer grained results we get from flent. (https://flent.org).
And I totally agree that the trendline is to ever more devices doing ever more stuff randomly when you least desire it. We need to have edge routers AND ISPs ready for this change in traffic patterns.
The article you cited was quite good, although it missed completely the outputs of the ietf aqm working group, of which both I and fred baker are members.
https://tools.ietf.org/wg/aqm/
It has nothing to do with writing code, but normal uses actually using the internet when contending for bandwidth.
It is entirely probable we've been inside our own filter bubble so long (6 years) we cannot properly communicate with first time readers! some folk explaining the problem... the ietf video shows the benefit from fixing it. https://www.bufferbloat.net/pr... showing the extent: http://www.dslreports.com/spee... you have this entirely backwards: "Buffering can reduce latency, especially under heavy load, by better bandwidth utilization, and allowing faster retransmission of dropped packets. If it is slowing things down, then you should fix the buffering rather than eliminating it." You want enough buffering to absorb bursts, but any more just adds latency. Van Jacobson and kathie nichols calls this distinction good queue and bad queue: https://tools.ietf.org/html/dr... Less buffering (and fair queuing) allows for faster retransmission in particular.