A Possible Cause of AT&T's Wireless Clog — Configuration Errors
AT&T customers (iPhone users notably among them) have seen some wireless congestion in recent months; Brough Turner thinks the trouble might be self-inflicted. According to Turner, the poor throughput and connection errors can be chalked up to "configuration errors specifically, congestion collapse induced by misconfigured buffers in their mobile core network." His explanation makes an interesting read.
I find it just as problematic that applications software on Windows Mobile and other similar mobile OSes do not handle large network delays gracefully.
There is often very little feedback to the user of the software that actual progress is being made in attempt to communicate over the network. Sure, we can use the fuzzy "bars" indicator on the device to help diagnose what may be the cause of our trouble, but that doesn't indicate actual network conditions due to capacity. We also have animated indicators that web browsers and other applications use, but these still don't indicate any kind of actual success to communicate. In web browsers we get text alluding the DNS lookup, and connection attempt, but when you combine 'Connecting to...' with a simple spinning indicator or progress bar, that often doesn't convey that the message reached any destination or how long until you can expect any response from your local network based on its operating conditions.
The writers of the software may not fully understand the implications of being on a network with high packet loss or long round trip times. So they timeout or have errors that could be resolved by more delay or retry. In a mobile OS we should probably take this into account at the OS level, and opt out of this behavior only when the programmer or user specifies (if that's exposed).
it doesn't help that the safari client that the iphone uses will double load a page. Even if the user closes safari for a couple minutes, when reopening the browser the current page will reload. lose lose for everyone.
I worked for AT&T in several parts of the country on their core networks, and in the early 2000's they had misconfigured all of their Solaris boxes and I worked with the infrastructure group to implement a startup script on Solaris to tune all the ndd settings for performance. The problem with Solaris is that by default all the TCP, UDP, Ethernet, etc settings are set for a Desktop workstation, not a server. Most system admins know to tune these settings, otherwise in a lot of cases a multi-CPU box will perform as slow as a 1 CPU box. Anyway, at specific companies I worked with (AT&T Broadband / Worldnet in St. Charles, MO was one big one), all the servers were configured without the proper settings for a server, so we had all kinds of issues as a result, a big one is that the tcp accept queue is not set high enough and so connections to daemons will drop after a low number of connections, making it appear that the box can't handle the connections...., As a result, they had spent millions on numerous servers (in one situation they had over twenty 12-cpu servers just for smtp...
These changes seem small, however, changing "ndd" kernel parameters on a Solaris box is not a single task, it is an infrastructure-wide task, and therefore requires the coordination of dozens of different groups, it really took a long long time to get this script implemented. It was called "S99nddfix" and it had all the ndd tunable parameters in it. Later when I worked at a different AT&T group in a different state, I noticed my script had been implemented on all the Solaris servers in the 200+ server environment.
This is the problem. Thanks to the competitive barriers (such as the inability to move phones between all but two of the top four networks, and none of the top 3) moving can take a long time (2 year contract must expire) before someone can move networks unless they want to pay a large fee.
And then, you probably lose your phone. So even if you like it, you have to buyer either a different phone from the new provider, or the same one in their version. Both will cost you even more money, unless you're willing to be stuck on another 2 year contract.
The US system is very well setup, as far as carrier lock in goes.
It's rather amazing how many people go to AT&T for the iPhone. I think they said about 1/3 of their iPhone customers are coming from other networks. I wonder how many more people would get iPhones if it wasn't for their current contract? That's a big reason for many people I've talked to. The rest who want an iPhone are in the "I'd love it but I'm not touching AT&T again" camp.
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
Zero packet loss may sound impressive to a telephone guy, but it causes TCP congestion collapse and thus doesn't work for the mobile Internet!
I was in the standardisation group that specified the RLC/MAC layer (ETSI SMG2, later called 3GPP TSG GERAN) and our priorities were not the behaviour of TCP. We were designing the radio layer to provide a bearer service for the higher layer protocols, at that time they were X25, IP (UDP and TCP). The "problem" we were trying to solve was the tendancy of the radio layer to fade, have multipath and generally lose packets. The RLC layer was designed to deliver error-free packets, in sequence over the radio layer. Generally that is exactly what it does, and does well. If it didn't then tehre would be no mobile internet.
What we did find to be a significant performance problem was the asymetric channel. The uplink is usually the root of the TCP performance issues, UDP works much better. When the discrepancy is higher than 10, the downlink is ten times faster than the uplink, then the TCP Acks don't arrive in time and it stalls. Sadly a faster uplink is difficult and expensive to provide.
I know somebody who works on network infrastructure for Telstra. I suggested to him that a lot of traffic which currently goes through wireless and wired LANs will soon run through the cellular networks. He was horrified at the idea. Apparently TCP/IP traffic from 3G cells has to go all the way back to the internet backbone, so anything resembling P2P still saturates the links between the base stations and the back end. Thats a minor issue just now but in addition the links to the 3G cells are only just keeping up with demand right now.
I pointed to the European environment where 3G data is much cheaper and more bandwidth is available. He says that we don't do that kind of investment here. So at the end of the day its a money problem. Lots of profit being taken while they can get away with it.
http://michaelsmith.id.au
I know somebody who works on network infrastructure for Telstra. I suggested to him that a lot of traffic which currently goes through wireless and wired LANs will soon run through the cellular networks. He was horrified at the idea. Apparently TCP/IP traffic from 3G cells has to go all the way back to the internet backbone, so anything resembling P2P still saturates the links between the base stations and the back end. Thats a minor issue just now but in addition the links to the 3G cells are only just keeping up with demand right now.
I pointed to the European environment where 3G data is much cheaper and more bandwidth is available. He says that we don't do that kind of investment here. So at the end of the day its a money problem. Lots of profit being taken while they can get away with it.
Yeah, I love the lack of forward planning by Telcos in Australia.
Some years ago, there was talk of building some huge fiber-optic ring around the Pacific, connecting a bunch of countries. The only telco in Australia at the time that could afford to buy into the project was Telstra. One of the VPs of Telstra was quoted as saying "we have sufficient bandwidth right now". Think about it: the VP of a telco couldn't quite understand the need to maintain exponential growth in bandwidth right when broadband was taking off. Thanks to morons like that overpaid suit, Australia has been bandwidth-starved for a decade, which is why you don't see that many truly "unlimited" plans or free WiFi access points like in other countries.
I don't have an iPhone and don't have experience with this particular problem, but in general there aren't automatic monitoring devices for mobile networks out in the field, so if AT&T wants to know what is happening on the devices, they have to send a team out with tools and monitoring devices to check. If this is a problem that only happens when several iphone users get together in an area at the same time, then the problem may have gone away by the time a team comes out to check (if they come at all).
Qxe4
As I recall, the story went: Mandelbrot was a mathematician at IBM lab. The engineers were attempting high speed data networking, but were encountering data/signal loss due to some noise. So like good engineers, they made things more robust, better isolation, grounds, shielding, etc. but the darn noise was still there.. They could not get rid of it. Determined to find the cause, they went to Mandelbrot with the request to analyze the noise, to determine its cause, in order to eliminate it.
Mandelbrot examined the data and found that there were periods of clear signal interrupted by noise. He examined the noise and found that within it were periods of clear signal, interrupted by noise and so on. Hmmm... He astutely determined that "shit happens" and what was needed was a redundant protocol, not better shielding. The noise you see, was inherent in a damped and driven system.
It was from this that he began his explorations of fractals and chaos theory, and we got robust network protocols.