Slashdot Mirror


A Possible Cause of AT&T's Wireless Clog — Configuration Errors

AT&T customers (iPhone users notably among them) have seen some wireless congestion in recent months; Brough Turner thinks the trouble might be self-inflicted. According to Turner, the poor throughput and connection errors can be chalked up to "configuration errors specifically, congestion collapse induced by misconfigured buffers in their mobile core network." His explanation makes an interesting read.

12 of 217 comments (clear)

  1. Hm by Anonymous Coward · · Score: 5, Insightful

    His explanation makes an interesting read.

    I'd like to think that's a given, considering it's a news story. At any rate, from TFA:

    The bottleneck link is the over-the-air link, i.e. the connection from radio access network or UTRAN to the Mobile Statation (MS) in the above diagram, therefore the critical buffers are those at the UTRAN. In practice the UTRAN includes both the basestations (called Node-Bs) and the Radio Network Controllers (RNCs) which coordinate handovers between basestations (among other things). Because of hand-overs, the amount of data buffered at the Node-B is relatively small. It's the buffers at the RNC that must be large enough to deal with the delay variations in the radio network and yet small enough to induce packet loss when the network gets congested.

    I am not a network engineer, but how exactly could 8 second ping time be not noticed by the AT&T engineers who set up, configured, and monitored their OTA link? I would think that we're not talking about some dude's set of bridged dd-wrt linksys routers, but some serious heavy-duty RF equipment. I'm thinking on the order of several zeros...

  2. This is impossible, I've seen the buffer settings by Anonymous Coward · · Score: 5, Funny

    You see, most blokes, you know, will be buffering at ten. You're on ten here, all the way up, all the way up, all the way up, you're on ten on your buffer. Where can you go from there? Where?

    I don't know.

    Nowhere. Exactly. What we do is, if we need that extra push over the cliff, you know what we do?

    Put it up to eleven.

    Eleven. Exactly. One more buffered.

  3. I know this first hand by NynexNinja · · Score: 5, Interesting

    I worked for AT&T in several parts of the country on their core networks, and in the early 2000's they had misconfigured all of their Solaris boxes and I worked with the infrastructure group to implement a startup script on Solaris to tune all the ndd settings for performance. The problem with Solaris is that by default all the TCP, UDP, Ethernet, etc settings are set for a Desktop workstation, not a server. Most system admins know to tune these settings, otherwise in a lot of cases a multi-CPU box will perform as slow as a 1 CPU box. Anyway, at specific companies I worked with (AT&T Broadband / Worldnet in St. Charles, MO was one big one), all the servers were configured without the proper settings for a server, so we had all kinds of issues as a result, a big one is that the tcp accept queue is not set high enough and so connections to daemons will drop after a low number of connections, making it appear that the box can't handle the connections...., As a result, they had spent millions on numerous servers (in one situation they had over twenty 12-cpu servers just for smtp...

    These changes seem small, however, changing "ndd" kernel parameters on a Solaris box is not a single task, it is an infrastructure-wide task, and therefore requires the coordination of dozens of different groups, it really took a long long time to get this script implemented. It was called "S99nddfix" and it had all the ndd tunable parameters in it. Later when I worked at a different AT&T group in a different state, I noticed my script had been implemented on all the Solaris servers in the 200+ server environment.

  4. Re:Zero packet loss = epic fail by MBCook · · Score: 5, Informative

    Wow. This is kind of amazing.

    Nothing on this page (as I type) talks about zero packet loss, except you. That means you read the article.

    Of course, the article says that AT&T has set their buffers large enough to prevent packet loss due to congestion in transit, not that they expect no radio packet loss. The problem is that TCP/IP needs packet loss to tell it when it's going too fast and AT&T's decision causes this to fail spectacularly at times.

    The trolls read the articles. Weird.

    --
    Comment forecast: Bits of genius surrounded by a sea of mediocrity.
  5. Re:Software Robustness by kaiser423 · · Score: 5, Interesting

    Blackberries are awesome about this with the bi-directional communication arrows. When I'm with friends in an area of low reception, they're all walking around randomly trying to call every two yards, and waiting 15 seconds before determining that its not going to work. I walk around until I see an incoming arrow. I freeze and then make a call. Works wonderously.

  6. Non-obvious cause by NixieBunny · · Score: 5, Informative
    If you take the time to RTFA, you will see that the problem with TCP management (as Mr. Turner describes it) is that you have to cause the system to drop packets occasionally when it's near but not quite at saturation, to let the TCP device at the other end know that the network is getting congested. If there are no dropped packets, TCP ups the packet rate until the network becomes clogged.

    So in this case, zero packet loss is a setup for disaster instead of a desirable quality.

    The trouble is that it's not an intuitive solution to a problem, the introduction of occasional packet loss. It's usually something to avoid.

    --
    The determined Real Programmer can write Fortran programs in any language.
  7. Zero Packet Loss by Anonymous Coward · · Score: 5, Interesting

    Zero packet loss may sound impressive to a telephone guy, but it causes TCP congestion collapse and thus doesn't work for the mobile Internet!

    I was in the standardisation group that specified the RLC/MAC layer (ETSI SMG2, later called 3GPP TSG GERAN) and our priorities were not the behaviour of TCP. We were designing the radio layer to provide a bearer service for the higher layer protocols, at that time they were X25, IP (UDP and TCP). The "problem" we were trying to solve was the tendancy of the radio layer to fade, have multipath and generally lose packets. The RLC layer was designed to deliver error-free packets, in sequence over the radio layer. Generally that is exactly what it does, and does well. If it didn't then tehre would be no mobile internet.

    What we did find to be a significant performance problem was the asymetric channel. The uplink is usually the root of the TCP performance issues, UDP works much better. When the discrepancy is higher than 10, the downlink is ten times faster than the uplink, then the TCP Acks don't arrive in time and it stalls. Sadly a faster uplink is difficult and expensive to provide.

  8. Re:This is impossible, I've seen the buffer settin by jmac_the_man · · Score: 5, Funny

    When did you last meet an unfunny penis?

    Probably when he met the guy that modded that comment down.

  9. Re:First Time by MichaelSmith · · Score: 5, Interesting

    I know somebody who works on network infrastructure for Telstra. I suggested to him that a lot of traffic which currently goes through wireless and wired LANs will soon run through the cellular networks. He was horrified at the idea. Apparently TCP/IP traffic from 3G cells has to go all the way back to the internet backbone, so anything resembling P2P still saturates the links between the base stations and the back end. Thats a minor issue just now but in addition the links to the 3G cells are only just keeping up with demand right now.

    I pointed to the European environment where 3G data is much cheaper and more bandwidth is available. He says that we don't do that kind of investment here. So at the end of the day its a money problem. Lots of profit being taken while they can get away with it.

  10. Re:Software Robustness by Anonymous Coward · · Score: 5, Funny

    I walk around until I see an incoming arrow. I freeze and

    people and cars crash into me

  11. Re:First Time by jyx · · Score: 5, Insightful

    Oh noes, I'm feeding the trolls again.

    But whatever... it doesn't matter. Because at the end of the day, the techie nerds will continue to have no respect for management... and then they'll wonder why they're treated with no respect in return.

    So you think the techies that have taken the time to explain all the reasons *why* something needs to be done are stupid.

    But you can sit back and say 'loose 8% from your budget - go do it'. No reasons, no explanations just a demand. (Brillant!).

    I'm guessing your also the same arsehole that screams at the 'stupid' techies for not being able to restore that sales contract from two months ago that you accidentally deleted - Forgetting about that replacement broken tape drive you refused to pay for last quarter.

    As a manager you have got to be the conduit between the workers and the directors. Here's a tip, how about try talking to your techies. No seriously, talk to them. Show them your budget, show them your overheads. Ask them to provide assistance in setting the priorities instead of telling them to get stuffed.

    You may end up *earning* some respect from the people who are actually keeping your company running and who don't play musical employers when things start getting to hard.

  12. Benoit Mandelbrot had a similar problem by j-stroy · · Score: 5, Interesting

    As I recall, the story went: Mandelbrot was a mathematician at IBM lab. The engineers were attempting high speed data networking, but were encountering data/signal loss due to some noise. So like good engineers, they made things more robust, better isolation, grounds, shielding, etc. but the darn noise was still there.. They could not get rid of it. Determined to find the cause, they went to Mandelbrot with the request to analyze the noise, to determine its cause, in order to eliminate it.

    Mandelbrot examined the data and found that there were periods of clear signal interrupted by noise. He examined the noise and found that within it were periods of clear signal, interrupted by noise and so on. Hmmm... He astutely determined that "shit happens" and what was needed was a redundant protocol, not better shielding. The noise you see, was inherent in a damped and driven system.

    It was from this that he began his explorations of fractals and chaos theory, and we got robust network protocols.