Google, Microsoft Cheat On Slow-Start — Should You?

← Back to Stories (view on slashdot.org)

Google, Microsoft Cheat On Slow-Start — Should You?

Posted by Soulskill on Friday November 26, 2010 @06:47AM from the reply-hazy-ask-again dept.

kdawson writes "Software developer and blogger Ben Strong did a little exploring to find out how Google achieves its admirably fast load times. What he discovered is that Google, and to a much greater extent Microsoft, are cheating on the 'slow-start' requirement of RFC-3390. His research indicates that discussion of this practice on the Net is at an early, and somewhat theoretical, stage. Strong concludes with this question: 'What should I do in my app (and what should you do in yours)? Join the arms race or sit on the sidelines and let Google have all the page-load glory?'"

8 of 123 comments (clear)

Min score:

Reason:

Sort:

Misread the RFC by Spazmania · 2010-11-26 06:54 · Score: 5, Informative

RFC 3390 uses the "MUST" terminology exactly one place: when describing behavior after a packet is lost during the syn/synack. It doesn't use the phrase "MUST NOT" anywhere.
In every other respect slow-start is recommended but optional. Google is in no way breaching the standard by not using it.

--
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
1. Re:Misread the RFC by da+cog · 2010-11-26 07:10 · Score: 5, Insightful
  
  Yes, and for a post complaining about cheating I am mildly annoyed that he himself cheated his way around my "filter all posts made by editor kdawson" setting by submitting his story as a normal user and then getting another editor to post it.
  
  --
  Snarkiness is inversely proportional to wisdom because it emphasizes feeling right rather than being right.
2. Re:Misread the RFC by fluffy99 · 2010-11-26 09:43 · Score: 5, Informative
  
  Not sure why you got modded informative since the original poster and your "me-too" are both wrong . RFC 3390 is an extension to RFC2581. RFC 3390 says you MAY use an IW of up to 4 segments. If you don't use this option, you fall under RFC2581 which says the IW MUST be less than or equal to 2 segments.
  http://www.rfc-editor.org/rfc/rfc3390.txt
  http://www.rfc-editor.org/rfc/rfc2581.txt
lol kdawson by Lunix+Nutcase · 2010-11-26 06:56 · Score: 5, Interesting

So kdawson couldn't post this FUD himself? He needed Soulskill to do it for him?
Editors shouldn't be allowed to post stories. by BitZtream · 2010-11-26 07:16 · Score: 5, Insightful

I intentionally removed kdawson and timothy from the front page on slashdot just so I wouldn't have to see their ignorant, retarded, not a fucking clue posts ...
Did they realize that no one read their tripe anymore now they have to have someone else approve it for them?
kdawson and timothy are idiots, please give me a way to automatically not see anything that has to do with those two morons. Please.
kdawson is cheating to get around the effort I put on not seeing his crap, MS and Google on the other hand are following the RFC just fine ... if anyone involved in the posting of this story had a clue about what it said or did any sort of actual research than I wouldn't have to rant about it ...

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
This is well known to a small community by Animats · 2010-11-26 07:36 · Score: 5, Insightful

That's been known in the TCP community for decades.
I looked at this back in my RFC 896 days, when TCP was in initial development and I was working on congestion. I introduced the "congestion window" concept and put it in a TCP implementation (3COM's UNET, which predated Berkeley BSD). The question was, what should be the initial size of the congestion window? If it's small, you get "slow start"; if it's large, the sender can blast a big chunk of data at the receiver at start, up to the amount of buffering the receiver is advertising.
I decided back then to start with a big congestion window, because starting with a small one would slow down traffic even when bandwidth was available. One of the big performance issues back then was the time required to FTP a directory across a LAN, where TCP connections were being set up and torn down at a high rate. So startup time mattered. The decision to go with a smaller initial congestion window size came years later, from others. This reflected trends in router design. I wanted routers to have "fair queuing", so that sending lots of packets from one source didn't gain the sender any bandwidth over sending few packets. But routers gained speed faster than RAM costs dropped, and so faster routers couldn't have enough RAM for fair queuing. Today, your "last mile" CISCO router might have fair queuing. Some DOCSIS cable modem termination units have it. But many routers are running Random Early Drop, which is a simple but mediocre approach. (The backbone routers barely queue at all; if they can't forward something fast, they drop it. Network design tries to keep the congestion near the edges, where it can be dealt with.)
Remember, every dropped packet has to be retransmitted. (Too much of that leads to congestion collapse, a term I coined in 1984. That's what the "Nagle algorithm" is about.) In a world with packet-dropping routers, "slow start" makes sense. So that was put into TCP in the late 1980s (by which time I was out of networking.)
However, the RFC-documented slow start algorithm is rather conservative. RFC 2001 says to start at one maximum segment size. Microsoft's implementations in Win95 and later start at two maximum segment sizes. In RFC 3390, from 2002, the limit was raised to 3 or 4 maximum segment sizes. (We used to worry about delaying keystroke echo too much because big FTP packets were tying up the 9600 baud lines too long. We're past that.)
But Google is sending at least 8 segments at start, and Microsoft was observed to be sending 43. Sending 43 packets blind is definitely overdoing it.
I wonder whether they're doing this blindly, or if there's more smarts behind the scenes. If their TCP implementation kept a cache of recent final congestion window sizes by IP address, they could legitimately start off the next connection with the value from the last one. So, having discovered a path that's not dropping big bursts of packets, they could legitimately start fast. If they're just doing it the dumb way, starting fast every time, that's going to choke some part of the net under heavy load.
1. Re:This is well known to a small community by carton · 2010-11-26 08:44 · Score: 5, Insightful
  
  Yes, that's my understanding as well---the point of slow start is to go easy on the output queues of whichever routers experience congestion, so if congestion happens only on the last mile a hypothetical bad slow-start tradeoff does indeed only affect that one household (not necessarily only that one user), but if it happens deeper within the Internet it's everyone's problem contrary to what some other posters on this thread have been saying.
  WFQ is nice but WFQ currently seems to be too complicated to implement in an ASIC, so Cisco only does it by default on some <2Mbit/s interfaces. Another WFQ question is, on what inputs do you do the queue hash? For default Cisco it's on TCP flow, which helps for this discussion, but I will bet you (albeit a totally uninformed bet) that CMTS will do WFQ per household putting all the flows of one household into the same bucket, since their goal is to share the channel among customers, not to improve the user experience of individual households---they expect people inside the house to yell at each other to use the internet ``more gently'' which is pathetic. In this way, WFQ won't protect a household's skype sessions from being blasted by MS fast-start the way Cisco default WFQ would.
  If anything, cable plants may actually make TCP-algorithm-related congestion worse because I heard a rumor they try to conserve space on their upstream channel by batching TCP ACK's, which introduces jitter, meaning the windowsize needs to be larger, and makes TCP's downstream more ``microbursty'' than it needs to be. If they are going to batch upstream on purpose, maybe they should timestamp upstream packets in the customer device and delay them in the CMTS to simulate a fixed-delay link---they could do this TS+delay per-flow rather than per-customer if they do not want to batch all kinds of packets (ex maybe let DNS ones through instantly).
  RED is not too complicated to implement in ASIC, but (a) I think many routers, including DSLAM's, actually seem to be running *FIFO* which is much worse than RED even, because it can cause synchronization when there are many TCP flows---all the flows start and stop at once. (b) RED is not that good because it has parameters that need to be tuned according to approximately how many TCP flows there are. I think BLUE is much better in this respect, and is also simple enough to implement in ASIC, but AFAIK nobody has.
  I think much of the conservatism on TCP implementers' part can be blamed on router vendors failing to step up and implement decades-old research on practical ASIC-implementable queueing algorithms. I've the impression that even the latest edge stuff focuses on having deep, stupid (FIFO) queues (Arista?) or minimizing jitter (Nexus?). Cisco has actually taken RED *off* the menu for post-6500 platforms: 3550 had it on the uplink ports, but 3560 has ``weighted tail drop'' which AFAICT is just fancy FIFO. I'd love to be proved wrong by someone who knows more, but I think they are actually moving backwards rather than stepping up and implementing BLUE.
  and I like very much your point that cacheing window sizes per /32 is the right way to solve this rather than haggling about the appropriate default, especially in the modern world of megasites and load balancers where a clever site could appear to share this cached knowledge quite widely. but IMSHO routing equipment vendors need to be stepping up to the TCP game, too.
2. Re:This is well known to a small community by egoots · 2010-11-26 09:26 · Score: 5, Informative
  
  Are you a wizard?
  No, he's John Nagle.