Google, Microsoft Cheat On Slow-Start — Should You?

← Back to Stories (view on slashdot.org)

Google, Microsoft Cheat On Slow-Start — Should You?

Posted by Soulskill on Friday November 26, 2010 @06:47AM from the reply-hazy-ask-again dept.

kdawson writes "Software developer and blogger Ben Strong did a little exploring to find out how Google achieves its admirably fast load times. What he discovered is that Google, and to a much greater extent Microsoft, are cheating on the 'slow-start' requirement of RFC-3390. His research indicates that discussion of this practice on the Net is at an early, and somewhat theoretical, stage. Strong concludes with this question: 'What should I do in my app (and what should you do in yours)? Join the arms race or sit on the sidelines and let Google have all the page-load glory?'"

92 of 123 comments (clear)

Min score:

Reason:

Sort:

I do. by cgomezr · 2010-11-26 06:52 · Score: 4, Funny

Without cheating, I wouldn't get the first post.
1. Re:I do. by Tridus · 2010-11-26 06:59 · Score: 2, Funny
  
  Is this possibly the first ever on-topic "first!" post?
  
  --
  -- "So they told me that using the download page to download something was not something they anticipated." - Bill Gates
2. Re:I do. by Jello+B. · 2010-11-26 07:27 · Score: 3, Informative
  
  not even close, sorry
3. Re:I do. by Tubal-Cain · 2010-11-26 13:30 · Score: 3, Informative
  
  Not even the first one this week.
Misread the RFC by Spazmania · 2010-11-26 06:54 · Score: 5, Informative

RFC 3390 uses the "MUST" terminology exactly one place: when describing behavior after a packet is lost during the syn/synack. It doesn't use the phrase "MUST NOT" anywhere.
In every other respect slow-start is recommended but optional. Google is in no way breaching the standard by not using it.

--
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
1. Re:Misread the RFC by H0p313ss · 2010-11-26 07:02 · Score: 4, Informative
  
  RFC 3390 uses the "MUST" terminology exactly one place: when describing behavior after a packet is lost during the syn/synack. It doesn't use the phrase "MUST NOT" anywhere.
  In every other respect slow-start is recommended but optional. Google is in no way breaching the standard by not using it.
  I just logged in to say exactly the same thing. Not implementing an optional variant is not cheating. Nothing to see, move along.
  
  --
  XML is a known as a key material required to create SMD: Software of Mass Destruction
2. Re:Misread the RFC by Lunix+Nutcase · 2010-11-26 07:03 · Score: 4, Insightful
  
  No, this was just kdawson trying to fill his FUD quota for the day. He's a little behind.
3. Re:Misread the RFC by Anonymous Coward · 2010-11-26 07:07 · Score: 2, Informative
  
  RFC 3390 defines the upper bound for the initial window to be min (4*MSS, max (2*MSS, 4380 bytes)), so it doesn't need to use "MUST NOT" to forbid larger initial window sizes.
4. Re:Misread the RFC by da+cog · 2010-11-26 07:10 · Score: 5, Insightful
  
  Yes, and for a post complaining about cheating I am mildly annoyed that he himself cheated his way around my "filter all posts made by editor kdawson" setting by submitting his story as a normal user and then getting another editor to post it.
  
  --
  Snarkiness is inversely proportional to wisdom because it emphasizes feeling right rather than being right.
5. Re:Misread the RFC by Lunix+Nutcase · 2010-11-26 07:13 · Score: 3, Insightful
  
  He probably knows he's being filtered by more and more people.
6. Re:Misread the RFC by Spazmania · 2010-11-26 07:17 · Score: 4, Insightful
  
  IETF uses the capitalized MUST/MUST NOT terminology for a reason. It's used anywhere an implementer could reasonably do something else but for some reason isn't allowed to. Where it isn't present, it isn't required. If the authors omitted that terminology even after referencing RFC 2119 in a standards track modification to such a widely used protocol, they did so because the entire modification is optional.
  
  --
  Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
7. Re:Misread the RFC by petermgreen · 2010-11-26 07:20 · Score: 1
  
  This increased initial window is optional: a TCP MAY start with a larger initial window. However, we expect that most general-purpose TCP implementations would choose to use the larger initial congestion window given in equation (1) above.
  
  --
  note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
8. Re:Misread the RFC by Anonymous Coward · 2010-11-26 07:21 · Score: 1, Informative
  
  From the RFC.
  This document obsoletes [RFC2414] and updates [RFC2581] and specifies
  an increase in the permitted upper bound for TCP's initial window
  from one or two segment(s) to between two and four segments.
  So it was officially increased in 2002.
  Maybe back then an initial window 2 - 4 segments seemed reasonable.
  Maybe the official standard is due for an update.
  For some reason I am not indignant about this news.
9. Re:Misread the RFC by Anonymous Coward · 2010-11-26 07:22 · Score: 3, Informative
  
  Indeed the modification is optional. It explicitly says so in the RFC. However, without the modification an even smaller initial window is set by the previous definition, which comes with all the MUSTs and MUST NOTs you can throw at an implementer.
10. Re:Misread the RFC by halivar · 2010-11-26 07:33 · Score: 2, Funny
  
  IOW, RTFRFC.
11. Re:Misread the RFC by Anonymous Coward · 2010-11-26 07:42 · Score: 3, Informative
  
  Yes, that means you're free to use the (smaller) limit from the older RFC or the new (larger) one from RFC3390. The authors expect most implementations to use the new one, which would allow Google to send 3 packets without waiting for ACKs. Google sends up to 9.
12. Re:Misread the RFC by ChipmunkDJE · 2010-11-26 07:48 · Score: 1
  
  So then I guess everybody should just skip slow-start then? If Google and Microsoft can and are having tremendous results, why shouldn't everybody? Heck, why is slow-start even still around then? Should be tossed to the wayside like a Colecovision if its optional and gets in the way of your performance...
13. Re:Misread the RFC by msauve · 2010-11-26 07:54 · Score: 1
  
  Even simpler. The very first line of the abstract says "This document specifies an optional standard..." The whole thing is a "MAY."
  
  --
  "National Security is the chief cause of national insecurity." - Celine's First Law
14. Re:Misread the RFC by Spazmania · 2010-11-26 07:54 · Score: 1
  
  Reference please? I'm afraid I'm not up on the sequence of TCP RFCs so I don't know where to find the "previous definition."
  
  --
  Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
15. Re:Misread the RFC by Anonymous Coward · 2010-11-26 08:02 · Score: 3, Informative
  
  Do you always have other people do your homework?
  From RFC3390 (that's the one we're discussing):
  "This document obsoletes [RFC2414] and updates [RFC2581] and specifies
  an increase in the permitted upper bound for TCP's initial window
  from one or two segment(s) to between two and four segments."
  I'd start with the one which RFC3390 updates.
16. Re:Misread the RFC by Spazmania · 2010-11-26 08:30 · Score: 3, Insightful
  
  Kay, so I've poked through the RFCs a bit...
  TCP first defined in RFC 793. No slow start; implementations generally send segments up to the window size negotiated in SYN exchange which is generally the smaller of the speakers' two buffers.
  Slow start first referenced in RFC 1122 (Internet host requirements) as: ''Recent work by Jacobson [ACM SIGCOMM-88] on Internet congestion and TCP retransmission stability has produced a transmission algorithm combining "slow start" with "congestion avoidance". A TCP MUST implement this algorithm.''
  At this point in the process there does not appear to be an RFC specifying TCP slow start making this statement in a document that is not itself about TCP per se very dubious.
  A decade later, RFC 2001 says: "Modern implementations of TCP contain four intertwined algorithms that have never been fully documented as Internet standards: slow start, congestion avoidance, fast retransmit, and fast recovery." The word "must" is subsequently used in connection with congestion avoidance but is not used in connection with slow start.
  RFC2414 then revisits the question of TCP's initial window size selection referencing RFC 2001 but again declines to state that TCP "must" start with a small window.
  RFC 2581 finally sets an unambiguous slow start requirement: The slow start and congestion avoidance algorithms MUST be used by a TCP sender [...] IW, the initial value of cwnd, MUST be less than or equal to 2*SMSS bytes and MUST NOT be more than 2 segments.
  However, even as it does so, it goes on to comment that, "We note that a non-standard, experimental TCP extension allows that a TCP MAY use a larger initial window [...] We do NOT allow this change as part of the standard defined by this document. However, we include discussion [...] in the remainder of this document as a guideline for those experimenting with the change, rather than conforming to the present standards for TCP congestion control."
  In other words, even though out of the box TCPs MUST implement slow start, it's understood that other behaviors are in use and are expected to continue.
  Finally, RFC 3390 allows the out-of-the-box behavior of TCP to use a larger initial window than 2581.
  Conclusion: Google still isn't cheating.
  
  --
  Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
17. Re:Misread the RFC by presidenteloco · 2010-11-26 08:56 · Score: 1
  
  Are you sure that the experimental extension being referred to in RFC2581 is not the one that was later formalized as RFC 3390, whose
  larger limits are still being violated apparently by Microsoft and Google and several others.
  Also, the RFC 2581 standard noted that "a NON-STANDARD EXPERIMENTAL TCP extension allows...bigger"
  Experimentation might be permissible, but vast-scale operational use of the non-standard extension by Google & Microsoft cannot
  be described as expermentation, and therefore is clearly not contemplated, nor countenanced, by RFC 2581.
  Unless you believe that Japanese "scientific whaling" is actually thousands and thousands of experiments on whales, that is.
  
  --
  
  Where are we going and why are we in a handbasket?
18. Re:Misread the RFC by Spazmania · 2010-11-26 08:56 · Score: 1
  
  Had you just quoted a standards document says "Here's how it's supposed to be done and now we'll offer some suggestions for all of you who decide not to do it this way" I'm not so sure I'd be quick accuse you of being the source of any cognitive dissonance.
  
  --
  Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
19. Re:Misread the RFC by Anonymous Coward · 2010-11-26 09:04 · Score: 1, Funny
  
  So are you.
20. Re:Misread the RFC by greed · 2010-11-26 09:10 · Score: 2, Funny
  
  Ah. I was a bit surprised to see this is a kdawson story for exactly that reason. Thanks.
  Where's my bigger hammer?
21. Re:Misread the RFC by fluffy99 · 2010-11-26 09:43 · Score: 5, Informative
  
  Not sure why you got modded informative since the original poster and your "me-too" are both wrong . RFC 3390 is an extension to RFC2581. RFC 3390 says you MAY use an IW of up to 4 segments. If you don't use this option, you fall under RFC2581 which says the IW MUST be less than or equal to 2 segments.
  http://www.rfc-editor.org/rfc/rfc3390.txt
  http://www.rfc-editor.org/rfc/rfc2581.txt
22. Re:Misread the RFC by fluffy99 · 2010-11-26 09:45 · Score: 2, Informative
  
  Learn how to use Google man!
  http://www.rfc-editor.org/rfc/rfc3390.txt [rfc-editor.org]
  http://www.rfc-editor.org/rfc/rfc2581.txt [rfc-editor.org]
23. Re:Misread the RFC by Spazmania · 2010-11-26 09:51 · Score: 3, Insightful
  
  What, are you stupid?
  "Document A doesn't say what you claim."
  "Yeah, but there's a previous document which does."
  "What previous document is that?"
  "Hur, learn to use google dude."
  
  --
  Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
24. Re:Misread the RFC by Spazmania · 2010-11-26 09:55 · Score: 1
  
  Heck, why is slow-start even still around then?
  Now that, my friend, is a VERY good question.
  
  --
  Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
25. Re:Misread the RFC by Nick+Ives · 2010-11-26 10:22 · Score: 2, Interesting
  
  We do NOT allow this change as part of the standard defined by this document.
  Seems fairly unambiguous to me.
  People have been gaming slow-start for yonks; I remember when you could ACK flood a server to increase your download speed. Server admins hated it because it slowed the site down for everyone else.
  
  --
  Nick
26. Re:Misread the RFC by James+Youngman · 2010-11-26 10:25 · Score: 2, Informative
  
  Indeed, Google already published a paper describing their approach:
  https://www.google.com/url?sa=t&source=web&cd=1&ved=0CA8QFjAA&url=http%3A%2F%2Fcode.google.com%2Fspeed%2Farticles%2Ftcp_initcwnd_paper.pdf&ei=xUciTKhpmOiUB_HL8bAP&usg=AFQjCNET-zahhIxtRlXe28xn_8QSXXLx6A&sig2=0mlyaOW1btaj7hUjlL1Opw
27. Re:Misread the RFC by hasdikarlsam · 2010-11-26 10:37 · Score: 1
  
  As people keep saying, you *may* use this optional standard.. as a replacement for an older standard where startup is *even slower*.
  Odd how that goes.
28. Re:Misread the RFC by Anonymous Coward · 2010-11-26 11:28 · Score: 1, Insightful
  
  IETF uses the capitalized MUST/MUST NOT terminology for a reason. It's used anywhere an implementer could reasonably do something else but for some reason isn't allowed to. Where it isn't present, it isn't required
  This is complete nonsense.
  The sign of a good RFC writer is not littering a document with MUST *** termonology. After a certain threshold it gets really old and implementors begin to ignore you.
  If there is a magic defined in an RFC or an algorithm used in a certain way more often than not it will NOT say that you MUST call the algorithm in this order with these special parameters. If however you don't follow the specification you should not expect your implementation to work at all.
  Recommendations often have very significant side effects if they are not followed. No wording in an RFC should ever be construed as a substitute for using ones brain and understanding the underlying basis upon which the the specification was arrived.
  If an outfit like Google has a better way of characterizing the link by for example keeping track of metrics obtained from recent connection histories then good for them.
  If however they are just turning off congestion control because it makes "their" site faster with the justification "usually" it is not necessary then fuck them.
  It seems to me the single worst thing one could do in a congested environment is add more connections with no realtime requirements in a non-congestion avoidant manner.
  Until I see simulation results to the contrary (Which is Googles burdon to supply) then I will just assume any instance of ignorant circumvention of slow-start is Google being Evil.
29. Re:Misread the RFC by samson13 · 2010-11-26 11:41 · Score: 1
  
  So then I guess everybody should just skip slow-start then? If Google and Microsoft can and are having tremendous results, why shouldn't everybody? Heck, why is slow-start even still around then? Should be tossed to the wayside like a Colecovision if its optional and gets in the way of your performance...
  Slow start probably should be skipped for most well tuned websites. Most HTTP connections are short lived enough to never ramp up to the available bandwidth or saturate queues so why use an algorithm designed to keep queues small while trying to efficiently use bandwidth.
  I think the slow start concept would still be useful for bulk transfer services.. If you are serving a couple of gig ISO images then you probably don't care about a bit of round trip time latency if it means you don't clobber router queues downstream. I could imagine congestion collapse would be more likely with this load.
  Bittorrent should probably use slow start. Often the competition for bit torrent connections are other connections for the same torrent. If we start too fast we could impact too many of these connections causing them to back off impacting overall performance.
  I'd guess that the magic numbers that were picked for slow start when the RFC was written are no longer applicable. RTT is shorter, queues are probably longer (near the edges anyway) but the queues are probably shorter in terms of time. i.e. less consequence for a dropped packet, less likely to fill a queue and less of a performance hit if we do fill the queue..
  Google's choice of initial window size would be well considered. If google's tuning impacted network performance then it would be causing packet loss to their own connections causing the latency to go up due to retries..
  Similarly microsoft's initial window size seems a bit ridiculous so I'd bet it is either:
  1) A mistake that is causing overall lower performance to their users.
  2) Course tuning that helps for the front page (so helps in general) but causes a lower performance for bigger pages.
  3) They are doing some sort of window size caching and that number was cached from previous connections.
  I did note that there were no retransmissions in the MS flow so that it doesn't seem like a bad guess. They don't support SACK (WTF) so that would slow things down if they lost packets.
30. Re:Misread the RFC by gringer · 2010-11-26 13:52 · Score: 1
  
  Learn how to use Google man!
  Maybe they tried, but their router rejected the connection from Google because it was sending too many packets in the initial window.
  
  --
  Ask me about repetitive DNA
31. Re:Misread the RFC by u38cg · 2010-11-26 19:12 · Score: 3, Interesting
  
  I thought he'd been sacked. I don't have him filtered (I like them where I can see them) and I haven't seen his stories for ages, or indeed anyone complaining about them :)
  
  --
  [FUCK BETA]
32. Re:Misread the RFC by Skal+Tura · 2010-11-27 04:12 · Score: 1
  
  Then how the average server admin can take advantage of this? ;)
  
  --
  Pulsed Media Seedboxes
33. Re:Misread the RFC by Skal+Tura · 2010-11-27 04:14 · Score: 1
  
  Maybe he cheated his way back in as well? ;)
  
  --
  Pulsed Media Seedboxes
34. Re:Misread the RFC by Skal+Tura · 2010-11-27 04:23 · Score: 1
  
  It could still be described as experimental: Seeking if it works on large scale operation and across any kind of devices. So not cheating, if they have intend to drop it, or if it's good suggest in a subsequent RFC for widespread implementation.
  My personal general rule of thumb is: It's not cheating if it A) Works, B) Does not have any significant negative impact
  and my another rule of thumb is: It's a must have if A) lowers cost of operation, B) net benefit greatly outweights negatives
  of course my general rules does not anything to do with wider scale implementation, but hey, this is /. and everyone's 2cents is as good as fact, right? >;)
  
  --
  Pulsed Media Seedboxes
35. Re:Misread the RFC by jesset77 · 2010-11-28 14:18 · Score: 1
  
  Then how the average server admin can take advantage of this? ;)
  This.
  Good god, screw the standards. If it's good enough for microgoogle, we should be doing it too. The only thing we have to lose is visitors to our own damn sites, right? So how can I tune my kernel / apache to firehose the TCP window? xD
  
  --
  People willing to trade their freedom of expression for temporary entertainment deserve neither and will lose both.
36. Re:Misread the RFC by Lennie · 2010-12-01 10:19 · Score: 1
  
  It actually is already a step further then that, they have a draft RFC:
  http://tools.ietf.org/html/draft-ietf-tcpm-initcwnd-00
  
  --
  New things are always on the horizon
I would have got first post... by Anonymous Coward · 2010-11-26 06:54 · Score: 1, Funny

...if it wasn't for slow start. Damn you, cwnd!
lol kdawson by Lunix+Nutcase · 2010-11-26 06:56 · Score: 5, Interesting

So kdawson couldn't post this FUD himself? He needed Soulskill to do it for him?
1. Re:lol kdawson by canajin56 · 2010-11-26 07:24 · Score: 1
  
  Obviously, his answer to the question of "should you cheat, too?" is "yes", and he started by cheating his way around my preferences that exclude all kdawson articles ;)
  
  --
  ASCII stupid question, get a stupid ANSI
2. Re:lol kdawson by Morty · 2010-11-26 08:51 · Score: 3, Interesting
  
  So kdawson couldn't post this FUD himself? He needed Soulskill to do it for him?
  Considering that people cannot be objective about their own posts, I applaud kdawson for *not* posting this. Letting it go through someone else's editorial review is the right thing to do.
3. Re:lol kdawson by The+End+Of+Days · 2010-11-26 13:12 · Score: 1
  
  Editorial review? Your uid is small enough that you should know better. Maybe you're just super subtle.
4. Re:lol kdawson by Morty · 2010-11-26 14:32 · Score: 1
  
  I know it sometimes doesn't seem that way, but slashdot does have standards. The editors have written policies on how they do what they do, and they try to follow them. While slashdot editors often fail to live up to the standards that they strive for -- they tend to publish duplicate stories, press releases, trolls, advertisements, and blatant spelling errors -- they do tend to avoid the more egregious violations. The mistakes are more along the lines of sloppiness than malice. Presumably that's why we're all still here, no?
TFA is really interesting! by courteaudotbiz · 2010-11-26 07:00 · Score: 3, Interesting
Great, yet simple research! It's funny to see how the web servers are acting exactly as their own mother company in real life:
- Google: Trying to be the first, tries to make a standard with some promising trick;
- Microsoft: Bypassing all rules to be the first;
- All the others: pretty average (I'd have expected Facebook to be more innovative on this side. Wait when they discover that this trick exists...)
Somebody call the waaaaambulance by js3 · 2010-11-26 07:03 · Score: 2, Insightful

When the competition starts crying you know someone is doing something right. Is it just me or has there been a lot of crying lately

--
did you forget to take your meds?
1. Re:Somebody call the waaaaambulance by Anonymous Coward · 2010-11-26 07:15 · Score: 2, Insightful
  
  To understand the relevance of this: The slow-start protocol/algorithm is meant to avoid a situation where many packets are put on the wire which will never be received due to congestion somewhere along the path. Such packets create unnecessary network load (they're transported all the way to the choke point and then they're discarded, so they have to be retransmitted.) The referenced RFC is from 2002, so one might argue that there isn't a problem if the burst of packets remains small. After all, there are other protocols which don't even use congestion control (particularly real-time applications like VoIP and other UDP based protocols) or cause bursts of initial traffic by concurrently starting many TCP connections (Bittorrent and other peer to peer networks). However, using an overly large initial window size is indeed a violation of a very central RFC, so it should not be done.
Seems to me... by 91degrees · 2010-11-26 07:13 · Score: 3, Insightful

This is reliable. It is comaptible with the spec (otherwise it wouldn't be reliable), and it's faster.

I don't think it matters whether Google "cheats" or not. I and they both want me to get the data as quickly as possible. Strict adherence to the guidelines doesn't matter to either of us and doesn't affect anyone else.
1. Re:Seems to me... by Mongoose+Disciple · 2010-11-26 07:14 · Score: 1
  
  This.
  In the real world, expect almost everyone to prioritize "what works best" over "what the standard says" except insofar as satisfying B is necessary to achieve A.
2. Re:Seems to me... by mysidia · 2010-11-26 07:20 · Score: 2, Interesting
  
  Strict adherence to the guidelines doesn't matter to either of us and doesn't affect anyone else.
  The Goal of slow start is to achieve minimal loss and fairness with all flows.
  Fairness does effect other people. Not using slow start is much more aggressive and can stop on other people's data flows, particularly when a shared WAN is involved, even flows that might be much more important than your casual Google search.
  But this may be a bigger concern for large ISPs that oversubscribe by having hundreds of thousands of customers, and only enough bandwidth to deliver the promised data rate for a few thousand.
3. Re:Seems to me... by WolfWithoutAClause · 2010-11-26 07:55 · Score: 3, Insightful
  
  It's going to be OK, provided it's only a small amount of traffic involved. But if everyone starts sending a lot of traffic like this... boom!
  In a sense Google are just saying that their search results are high priority traffic, and they've optimised it like that. Which is probably fair enough.
  But if you did that to anything that creates huge numbers of connections very rapidly and then sends a lot of data, perhaps using it for peer-peer networks, the network would start to suffer collapse.
  
  --
  -WolfWithoutAClause
  "Gravity is only a theory, not a fact!"
4. Re:Seems to me... by Lennie · 2010-12-01 11:34 · Score: 1
  
  Well, not really. They created a draft RFC which says, we can all do this. Because Google has a lot of visitors on their sites and they tested, monitored and analyzed this and wrote a paper about it.
  It being, that current connections have enough bandwidth to justify making an other change to the standard. Instead of the old initial window of 3 or 4 (which has been raised before from 1 or 2) they propose to make it 9 or 10.
  One of the reasons they say is, because current browsers (read: that is not IE6 or IE7) already open 6 connections per domain when downloading parts of a webpage. Which is more then the number of packets involved with a higher initial window of 10.
  http://code.google.com/speed/articles/tcp_initcwnd_paper.pdf
  http://tools.ietf.org/html/draft-ietf-tcpm-initcwnd-00
  
  --
  New things are always on the horizon
Editors shouldn't be allowed to post stories. by BitZtream · 2010-11-26 07:16 · Score: 5, Insightful

I intentionally removed kdawson and timothy from the front page on slashdot just so I wouldn't have to see their ignorant, retarded, not a fucking clue posts ...
Did they realize that no one read their tripe anymore now they have to have someone else approve it for them?
kdawson and timothy are idiots, please give me a way to automatically not see anything that has to do with those two morons. Please.
kdawson is cheating to get around the effort I put on not seeing his crap, MS and Google on the other hand are following the RFC just fine ... if anyone involved in the posting of this story had a clue about what it said or did any sort of actual research than I wouldn't have to rant about it ...

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
1. Re:Editors shouldn't be allowed to post stories. by Anonymous Coward · 2010-11-26 08:20 · Score: 1, Funny
  
  "please give me a way to automatically not see anything that has to do with those two morons. Please" Duct tape over eyes?
2. Re:Editors shouldn't be allowed to post stories. by Sir_Lewk · 2010-11-26 08:52 · Score: 1
  
  kdawson writes
  No, our gripe really is with kdawson.
  
  --
  "linux is just DOS with a UNIX like syntax" -- Galactic Dominator (944134)
This is well known to a small community by Animats · 2010-11-26 07:36 · Score: 5, Insightful

That's been known in the TCP community for decades.
I looked at this back in my RFC 896 days, when TCP was in initial development and I was working on congestion. I introduced the "congestion window" concept and put it in a TCP implementation (3COM's UNET, which predated Berkeley BSD). The question was, what should be the initial size of the congestion window? If it's small, you get "slow start"; if it's large, the sender can blast a big chunk of data at the receiver at start, up to the amount of buffering the receiver is advertising.
I decided back then to start with a big congestion window, because starting with a small one would slow down traffic even when bandwidth was available. One of the big performance issues back then was the time required to FTP a directory across a LAN, where TCP connections were being set up and torn down at a high rate. So startup time mattered. The decision to go with a smaller initial congestion window size came years later, from others. This reflected trends in router design. I wanted routers to have "fair queuing", so that sending lots of packets from one source didn't gain the sender any bandwidth over sending few packets. But routers gained speed faster than RAM costs dropped, and so faster routers couldn't have enough RAM for fair queuing. Today, your "last mile" CISCO router might have fair queuing. Some DOCSIS cable modem termination units have it. But many routers are running Random Early Drop, which is a simple but mediocre approach. (The backbone routers barely queue at all; if they can't forward something fast, they drop it. Network design tries to keep the congestion near the edges, where it can be dealt with.)
Remember, every dropped packet has to be retransmitted. (Too much of that leads to congestion collapse, a term I coined in 1984. That's what the "Nagle algorithm" is about.) In a world with packet-dropping routers, "slow start" makes sense. So that was put into TCP in the late 1980s (by which time I was out of networking.)
However, the RFC-documented slow start algorithm is rather conservative. RFC 2001 says to start at one maximum segment size. Microsoft's implementations in Win95 and later start at two maximum segment sizes. In RFC 3390, from 2002, the limit was raised to 3 or 4 maximum segment sizes. (We used to worry about delaying keystroke echo too much because big FTP packets were tying up the 9600 baud lines too long. We're past that.)
But Google is sending at least 8 segments at start, and Microsoft was observed to be sending 43. Sending 43 packets blind is definitely overdoing it.
I wonder whether they're doing this blindly, or if there's more smarts behind the scenes. If their TCP implementation kept a cache of recent final congestion window sizes by IP address, they could legitimately start off the next connection with the value from the last one. So, having discovered a path that's not dropping big bursts of packets, they could legitimately start fast. If they're just doing it the dumb way, starting fast every time, that's going to choke some part of the net under heavy load.
1. Re:This is well known to a small community by Jay+Tarbox · 2010-11-26 08:09 · Score: 4, Funny
  
  Are you a wizard?
2. Re:This is well known to a small community by Just+Some+Guy · 2010-11-26 08:23 · Score: 4, Funny
  
  Are you a wizard?
  No; his UID is too high. Now fetch me a sandwich, son.
  
  --
  Dewey, what part of this looks like authorities should be involved?
3. Re:This is well known to a small community by carton · 2010-11-26 08:44 · Score: 5, Insightful
  
  Yes, that's my understanding as well---the point of slow start is to go easy on the output queues of whichever routers experience congestion, so if congestion happens only on the last mile a hypothetical bad slow-start tradeoff does indeed only affect that one household (not necessarily only that one user), but if it happens deeper within the Internet it's everyone's problem contrary to what some other posters on this thread have been saying.
  WFQ is nice but WFQ currently seems to be too complicated to implement in an ASIC, so Cisco only does it by default on some <2Mbit/s interfaces. Another WFQ question is, on what inputs do you do the queue hash? For default Cisco it's on TCP flow, which helps for this discussion, but I will bet you (albeit a totally uninformed bet) that CMTS will do WFQ per household putting all the flows of one household into the same bucket, since their goal is to share the channel among customers, not to improve the user experience of individual households---they expect people inside the house to yell at each other to use the internet ``more gently'' which is pathetic. In this way, WFQ won't protect a household's skype sessions from being blasted by MS fast-start the way Cisco default WFQ would.
  If anything, cable plants may actually make TCP-algorithm-related congestion worse because I heard a rumor they try to conserve space on their upstream channel by batching TCP ACK's, which introduces jitter, meaning the windowsize needs to be larger, and makes TCP's downstream more ``microbursty'' than it needs to be. If they are going to batch upstream on purpose, maybe they should timestamp upstream packets in the customer device and delay them in the CMTS to simulate a fixed-delay link---they could do this TS+delay per-flow rather than per-customer if they do not want to batch all kinds of packets (ex maybe let DNS ones through instantly).
  RED is not too complicated to implement in ASIC, but (a) I think many routers, including DSLAM's, actually seem to be running *FIFO* which is much worse than RED even, because it can cause synchronization when there are many TCP flows---all the flows start and stop at once. (b) RED is not that good because it has parameters that need to be tuned according to approximately how many TCP flows there are. I think BLUE is much better in this respect, and is also simple enough to implement in ASIC, but AFAIK nobody has.
  I think much of the conservatism on TCP implementers' part can be blamed on router vendors failing to step up and implement decades-old research on practical ASIC-implementable queueing algorithms. I've the impression that even the latest edge stuff focuses on having deep, stupid (FIFO) queues (Arista?) or minimizing jitter (Nexus?). Cisco has actually taken RED *off* the menu for post-6500 platforms: 3550 had it on the uplink ports, but 3560 has ``weighted tail drop'' which AFAICT is just fancy FIFO. I'd love to be proved wrong by someone who knows more, but I think they are actually moving backwards rather than stepping up and implementing BLUE.
  and I like very much your point that cacheing window sizes per /32 is the right way to solve this rather than haggling about the appropriate default, especially in the modern world of megasites and load balancers where a clever site could appear to share this cached knowledge quite widely. but IMSHO routing equipment vendors need to be stepping up to the TCP game, too.
4. Re:This is well known to a small community by egoots · 2010-11-26 09:26 · Score: 5, Informative
  
  Are you a wizard?
  No, he's John Nagle.
5. Re:This is well known to a small community by slashdotmsiriv · 2010-11-26 09:44 · Score: 1
  
  Respect ...
6. Re:This is well known to a small community by Iron+Condor · 2010-11-26 10:02 · Score: 2, Insightful
  
  I wonder whether they're doing this blindly, or if there's more smarts behind the scenes. If their TCP implementation kept a cache of recent final congestion window sizes by IP address, they could legitimately start off the next connection with the value from the last one. So, having discovered a path that's not dropping big bursts of packets, they could legitimately start fast. If they're just doing it the dumb way, starting fast every time, that's going to choke some part of the net under heavy load.
  That strikes me as still-kinda-eigthies-thinking. I guess the question is what your assumption for an unknown segment of network is: If you assume that all parts of the net are congested most of the time, then you'll want to do a fast start up only on those segments that you know can handle it (doesn't have to be an individual IP - If my ISP buffers alright and you can reach it alright then it doesn't matter how many folks are sitting downstream from them - it becomes their problem.) If, on the other hand, you have the expectation that most packets on most of the net are going to be just fine (for whatever reason; even if by sheer brute force buffering and clever back-end algorithms that figure it all out after the fact) then it makes sense to do fast start with unknown clients and omit it only on those found NOT to be able to handle it. Kinda a glass-half-full way of looking at it.
  These days I'd wager that the vast (VAST!) majority of packets are part of ongoing streams - streaming Netflix over the net, torrenting the collected porn of the 80ies, that kind of thing. Which means I'm as sure as I can possibly be of something I haven't researched that the performance of the net is only in the most marginal way dependent on startup behaviour around individual connections any more. (Or better when/where it is, it is probably due to the 100 tcp connections that need to be established to view a single web page; fix that and the question of startup behaviour will just go away. Incidently, MS'es CHM concept was a step very much in the right direction...)
  
  --
  We're all born with nothing.
  If you die in debt, you're ahead.
7. Re:This is well known to a small community by Jay+Tarbox · 2010-11-26 10:18 · Score: 1
  
  I can fetch it, but I will NOT make one for you.
8. Re:This is well known to a small community by seanadams.com · 2010-11-26 15:07 · Score: 1
  
  If their TCP implementation kept a cache of recent final congestion window sizes by IP address, they could legitimately start off the next connection with the value from the last one.
  Wouldn't it also be necessary to cache the _rate_ of transmission so you don't overflow some intermediate queue? Eg imagine your sever is on gigE, feeding into a 1 Mbps uplink, and then a loooong pipe to the client who is on the other side of the world. In this case you might want to have an initial cwnd of a few dozen packets, but if you were to fire them all out immediately at 1gbps you would lost most of them at the first hop even though it's less than the available bandwidth*delay.
  So as I understand it this problem is more than just choosing the initial congestion window, it is also a matter of how fast you fill it. Normally that timing is driven by the acks coming back, but in the absence of that the sender needs to originate the timing.
9. Re:This is well known to a small community by Animats · 2010-11-26 17:56 · Score: 1
  
  These days I'd wager that the vast (VAST!) majority of packets are part of ongoing streams - streaming Netflix over the net, torrenting the collected porn of the 80ies, that kind of thing.
  True. However, many of those streams are bandwidth-adaptive and heavily buffered.
10. Re:This is well known to a small community by IAN · 2010-11-27 07:45 · Score: 1
  
  Ahhh... This is the essence of what occasionally makes Slashdot great. An unexpectedly expert post (and beautifully written, the sound you heard was that of a million grammar Nazis shrieking in frustration after scouring the text, not finding anything to complain about, and falling silent), followed by what I can only call "respectful irreverence". Splendid.
New era of networks by Anonymous Coward · 2010-11-26 07:36 · Score: 2, Interesting

Slow start and congestion avoidance were designed in the time of unreliable networks. Shouldn't the TCP/IP protocol be rediscussed in the age of fiber networks?
1. Re:New era of networks by Lennie · 2010-12-01 11:51 · Score: 1
  
  Congestion, route changes, blackholes, etc. also makes for unreliable networks. I don't think we want to change it. This is just a discussion about making slow start start with a larger window.
  
  --
  New things are always on the horizon
No Cheating is the Third Rule by Oxford_Comma_Lover · 2010-11-26 07:38 · Score: 4, Insightful

The Third rule of network design, for a moral being, is to consider the moral, ethical, and legal consequences of any atypical changes you make to your behavior.
Why the Third rule?
Because the first rule is to figure out what on earth is going on--not just in theory, but in fact. Code for the OSI model is ugly, perhaps by necessity (it has to be very fast), but it's code that is very, very easy to get wrong. It involves a lot of interacting pieces working on different levels of abstraction with other players that you don't have code control over.
The second rule is to realize when the first rule means that you shouldn't touch the stuff. Google and Microsoft have the engineering competence to mess with it--MSFT even should be messing with it, in terms of looking for ways to improve their behavior in a community-friendly way. Because they write the code that handles a huge portion of connections, and let's face it, TCP/IP just isn't designed for lots of things: AJAX or broadband, for example.
The third rule is to consider the moral and ethical and legal consequences of changes.
Only after at least these three steps should someone make changes that involve connections that go beyond the computers they control.

--
-- IANAL, this isn't legal advice, and definitely isn't legal advice for you. Also, Squee!
1. Re:No Cheating is the Third Rule by pthisis · 2010-11-26 07:51 · Score: 2, Informative
  
  Because the first rule is to figure out what on earth is going on--not just in theory, but in fact. Code for the OSI model is ugly, perhaps by necessity (it has to be very fast), but it's code that is very, very easy to get wrong. It involves a lot of interacting pieces working on different levels of abstraction with other players that you don't have code control over.
  TCP/IP predates the OSI model and conflicts with it in some areas; discussion of the complexities of code targeting OSI isn't directly applicable to TCP/IP implementations, though many similarities exist.
  Indeed, the fact that TCP/IP has fewer layers is often cited as one reason that it succeeded (coding an implementation of TCP/IP therefore being less complex than coding a fully abstracted 7-layer OSI implementation).
  
  --
  rage, rage against the dying of the light
2. Re:No Cheating is the Third Rule by teridon · 2010-11-26 17:13 · Score: 2, Insightful
  
  Isn't it time /. got a "-1 Reply Abuse" mod? The parent reply has nothing to do with the GP. It's on topic, and maybe it deserves the "Insightful" mod -- but it's replying to the top post just to appear at the top of the page. STOP THE MADNESS!
  
  --
  I hold it, that a little rebellion, now and then, is a good thing. -- Thomas Jefferson
Tragedy of the Commons by presidenteloco · 2010-11-26 07:50 · Score: 1

You do realize that if servers on the Internet start ignoring Internet standards (RFCs) as a matter of usual
practice that there is a very good chance the net will, if not grind to a halt, develop instability, the probability
of unreliability, poor performance, isolated unreachable islands etc.
This is a clear case of the tragedy of the commons. Only the general adherence to RFCs and effective
shunning mechanisms have prevented the tragedy from occurring so far.

--

Where are we going and why are we in a handbasket?
Web App by hey · 2010-11-26 07:53 · Score: 1

Well, this guy discovered something but wasted time he should have been working on his web app ;)
What about TCP congestive backoff and recover? by karl.auerbach · 2010-11-26 07:58 · Score: 1

Has anyone taken a look at whether Google, Microsoft, et al are similarly pushing on the TCP congestion backoff and recovery mechanisms?
Standardization, the right way... by osu-neko · 2010-11-26 08:11 · Score: 2, Insightful

First, implement it, and show that it works in practice.
Later, standardize the proven best practices.
Google, ur doin' it rite! :D

--
"Convictions are more dangerous enemies of truth than lies."
1. Re:Standardization, the right way... by Lennie · 2010-12-01 11:54 · Score: 1
  
  Their is also a draft here:
  http://tools.ietf.org/html/draft-ietf-tcpm-initcwnd-00
  The testing and analyzing is here:
  http://code.google.com/speed/articles/tcp_initcwnd_paper.pdf
  
  --
  New things are always on the horizon
Someone tell Linus it'll make his laptop go faster by Lazy+Jones · 2010-11-26 08:57 · Score: 1

Please, I'd like to use this on our web servers too... :-P

--
"I love my job, but I hate talking to people like you" (Freddie Mercury)
Results may vary by EvilIdler · 2010-11-26 10:02 · Score: 1

I'm not getting the blinding fast response time shown in the article at all :(
Google looks up my country via geo-location and feeds me a localised version (tested via the curl method in the article). This takes 0.9 seconds for me. If I directly specify google.co.uk or some other variation, I get a more reasonable 0.3 seconds. But never 85ms. Is the author sitting on a really awesome connection at work? ;)
Any "real world" complaints? by shutdown+-p+now · 2010-11-26 10:39 · Score: 1

I understand the theoretical problem with breaking the spec, but since it actually took this guy a packet sniffer to detect the violation, it would seem that, in practice, most (all?) clients out there are perfectly capable of processing this non-standard response. If so, then I don't see a problem, since it really is a de facto standard - and those appear all the time. The best thing they could do then is publish a new RFC to make it part of the spec, and move on.
RFC 5681 by j+h+woodyatt · 2010-11-26 12:49 · Score: 2, Informative

I suppose now would be a good time to point out that RFC 5681 is the most current specification of the standard for TCP congestion control. Would it be asking too much for people to stay current on the RFC series before they start cracking off about standards compliance?

--
jhw
Re:Good for them - not a lot of choice by WaffleMonster · 2010-11-26 12:58 · Score: 1

But every rule has its exception. Here you are dealing with TCP. It's so broken, so backwards, so conservative ...
"the gods thus spake in 1989, and their prophet Van Jacobsen likewise sayeth"
There are other protocols such as SCTP intended to address shortcommings of TCP... Yet after all these years nobody seems to care that they even exist. If TCP were as bad as your remarks suggest I would have expected more takers on the alternatives?
You comment TCP is so broken and backwards yet I don't know and you don't mention whats wrong with it?

Even browsers do crazy things on this principle. We'll open 2 (or 4, or whatever, depending on how clever you are) connections to the same web server.
Why??? The pipe(s) remain the same. Ah... but but any individual connection can remain compliant. The aggregate ... not so
TCP is a head of line blocking protocol supporting only one active stream per session.
By establishing multiple connections some can still transmit data while other TCP sessions might be idle waiting for acks. It makes a noticable difference in environments with high latency links..not so much anymore for broadband users.
Today the bigger reasons for it are just shortcomings in HTTP and browser technology stacks. If you sent everything in a single stream there is an ordering dependancy that significantly effects load time. For example if you send a large image before sending a style sheet the page loading now needs to wait for the style sheet. You could use more intelligence and huersitics to prioritize but the ideal dependancies are not always easy to resolve, deterministic or knowable a priori. Sending everything at once is low hanging fruit that for the most part works.

The aggregate ... not so, but TCP
doesn't constrain aggregates so one can remain in the church
Whats the point of even trying? You can't constrain aggregates WRT other applications, computers, access devices..etc so why pretend it makes any difference if it were possible on the TCP session level? In my view the only approach is for the session to be aware of the environment and live as cooperativly as reasonable within its constraints.

In the absence of this, companies like Google and Microsoft have little choice but to drive us forward using their own judgment
I'm concerned with the possibility of judgements slanted by each corporations narrow world views. I prefer open SDOs whos members are comprised of all stake holders take us forward.
I'm sorry... by matunos · 2010-11-26 13:26 · Score: 1

When did RFCs official standards at which you could "cheat"?
Consider this "cheating" Google and Microsoft's comments.
Before writing a post like this... by Blakey+Rat · 2010-11-26 15:53 · Score: 1

Before writing a post like this, you might want to wait a few minutes for the inevitable corrections to the inevitably wrong Slashdot story comes in. A good 50% of the stories on this site are misleading, and probably 25% of those are blatant lies.
Here's a pro-tip: if it says kdawson either as the editor *or* the submitter, it's complete bullshit. I don't think he's ever gotten a story entirely right in this whole career.

--
Comment of the year
Re:Someone tell Linus it'll make his laptop go fas by inKubus · 2010-11-26 21:55 · Score: 1

If you type
man ip
You will see that you can set the initial congestion window on a given route using
ip route change initcwnd NUMBER
*Where NUMBER=The maximum initial congestion window (cwnd) size in MSS of a TCP connection. I believe applications may also choose socket options although most of the time it's left to the OS. So go ahead and set it to 10 or whatever.

--
Cool! Amazing Toys.
Wow, I'm impressed. by tygerstripes · 2010-11-29 00:31 · Score: 1

Do you always talk in Perl? I'm not taking sides in what seems to be an embarrassingly petty argument, but that post was truly awful to read.

--
Meta will eat itself
Re:Ad hominem attack with no facts behind you? by tygerstripes · 2010-11-29 04:28 · Score: 1

"Ad hominem attack..."
"8 digit registered LUSER ID..."
If you're going to do all the work for me, there's no point in arguing. I'll just apologise for pricking your ego and leave it there.

--
Meta will eat itself
Re:Someone tell Linus it'll make his laptop go fas by Lennie · 2010-12-01 11:55 · Score: 1

Yes, 10 has been recommend as the new initial window:
http://tools.ietf.org/html/draft-hkchu-tcpm-initcwnd-01

--
New things are always on the horizon
Re:Someone tell Linus it'll make his laptop go fas by Lennie · 2010-12-01 11:56 · Score: 1

I think you meant:
ip route change default via $GW dev eth0 initcwnd 10
Where $GW is your default gateway.

--
New things are always on the horizon
Re:Good for them - not a lot of choice by Lennie · 2010-12-01 12:02 · Score: 1

HTTP only supports one active stream over TCP or SSL/TLS, SPDY is a proposal to allow HTTP over TCP or SSL/TLS to support multiple streams:
http://www.chromium.org/spdy/spdy-whitepaper
I'm guessing only multiple HTTP streams over SSL/TLS will be very backward compatibility with the existing internet.
Thus soon, https (thus SSL/TLS) with the SPDY extension may even be faster to load your webpage than normal HTTP.

--
New things are always on the horizon