Why IE Is So Fast ... Sometimes
safrit writes "Finally the scoop on how IE "cheats" a little to up its performance! Do RFCs mean nothing anymore? What's next, Riots in the streets, dogs and cats living together, mass hysteria!
From the blog story: 'Internet Explorer on Windows always seems either to run impossibly fast (page requests are fulfilled almost before the mouse button has returned to its original unclicked position), or ridiculously slow...' Now read to see why..."
...but the site has already been slashdotted! I suppose I'll just read it late tonight after the "mass hysteria" has settled.
The same powers that make IE impossibly fast also made this site crash impossibly fast. :)
Sigs? We don't need no stinking sigs!
Straight from the site.......
...And that's it. The client doesn't FIN, and the server doesn't ACK. In other words, the connection is kept "half-open" on the server end. The reason for this? Why, to make subsequent connections from IE clients faster. If the connection isn't torn down all the way, all IE has to do is send an HTTP request, with no preamble-- and the server will immediately respond. Ingenious!
Internet Explorer on Windows always seems either to run impossibly fast (page requests are fulfilled almost before the mouse button has returned to its original unclicked position), or ridiculously slow (as with the weird stalling-on-connect problem that many people, including myself, have noticed).
One possible explanation is something that my team and I noticed a couple of years ago, in analyzing packet traces of IE's connection setup procedure. Microsoft might have fixed this since then; I'm not sure. But it's a possible culprit.
First of all, for those rusty on their TCP/IP-- here's how a normal HTTP request over TCP should work:
Client Server
1. SYN ->
2.
4. Request ->
This is how the client and server synchronize their sequence numbers, which is how a connection gets established. The client sends a synchronization request, the server acknowledges it and sends a synchronization request of its own, and the client acknowledges that. Only then can the HTTP request proceed reliably.
The server's SYN (synchronize) and ACK (acknowledgement) packets are combined for speed; there's no reason to send two separate packets, when you're trying to get a connection established as quickly as possible. Another speed enhancement that Mac OS 9's stack uses, by the way, is to combine the client's ACK and the HTTP request into a single packet; this is legal, but not frequently done. The idea is that within the structure of TCP/IP, you want to minimize the number of transactions that need to take place in setting up the two-way handshake necessary before you can send the HTTP request.
When tearing down a connection, it looks like this:
Client Server
1.
3. FIN ->
4.
Uh... what? Dunno what the hell this is. I'll ignore it, or RST.
2. Oh, you're a standard server. Okay: SYN ->
3.
5. Request ->
In other words, instead of sending a SYN packet like every other TCP/IP application in the world, IE would send out the request packet first of all. Just to check. Just in case the HTTP server was, oh, say, a Microsoft IIS server. Because IIS' HTTP teardown sequence looked like this:
Client Server
1.
They probably called it "Microsoft Active Web AccelerationX(TM)®" or something.
(I may be remembering this incorrectly; it might be that the client does FIN, and the server simply keeps the connection around after it ACKs it. Instead of shutting down the connection entirely, it just waits to see if that client will come back, so it can open the connection back up immediately instead of having to go through that whole onerous SYN-SYN/ACK procedure. Damn rules!)
Now, what does this mean for non-IIS servers? It means that if you use IE to connect to them, it first tries to send that initial request packet, without any SYNs-- and then it only proceeds with the standard TCP connection setup procedure if the request packet gets a RST or no response (either of which is a valid way for a legal stack to deal with an unsynchronized packet). But IIS, playing by its own rules, would respond to that packet with an HTTP response right away, without bothering to complete the handshake. So IE to IIS servers will be nice and snappy, especially on subsequent connections after the first one. But IE to non-IIS servers waste a packet at the beginning of each request-- and depending on how the server handles that illegal request, it might immediately RST it, or it might just time out... which would make the browser seem infuriatingly slow to connect to new websites.
This is only marginally less stupid than RunTCP's "solution"-- and I say "marginally" only because in the grand scheme of things, this probably makes sense to Microsoft's network engineers. After all, eventually all clients will be Windows platforms running IE, and all servers will be Windows platforms running IIS. And then we can break all kinds of rules! Rules are only there to hold us back and force us to play nice with other vendors. Well, once the other vendors are all gone, who cares about some stupid RFC?
I have to admire their arrogance and their confidence. But it'll be some time before I can bring myself to admire their technical integrity.
These pretzels are making me thirsty.
...but I think thats because during the build process it caches the entire web, hence the build time!
Ah, damn Mozilla.
Heck, IE still uses an HTTP Accept line with */* at the end without quality ratings rather than a more complete one, like Mozilla's. Reason? It saves a few bytes.
n /xml,application/xhtml+xml,text/html;q=0.9,text/pl ain;q=0.8,video/x-mng,image/png,image/jpeg,image/g if;q=0.2,*/*;q=0.1
Example:
IE 6/Win: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, */*
Mozilla: application/x-shockwave-flash,text/xml,applicatio
For Opera to get it's "Fastest browser on earth" title, it caches EVERYTHING. Even things that aren't supposed to be cached like SSL pages.
Does anyone know if this sequence is there for security purposes? It looks like this might lead to a spoofing vulnerability.
ato
i wonder about the relationship between this and the standard keepalive protocol, which basically is a standard that keeps a connection open for a certain amount of time so the browser doesn't have to keep opening new tcp connections for each image or whatever.
i would assume that the keepalive protocol reduces the ill effects of this system, since once a connection is made it doesn't have to be torn down and reestablished, or at least not for each request.
No, it doesn't. In fact, it doesn't even cache any page that's protected by a password, nor does it add them to the list of recently visited addresses (which is nice both for security and privacy reasons).
They are only kept in the RAM cache (i.e., when you press "back" or "forward", it will usually show you a page's last state (down to the position of the scroll bars), without reloading it. This is quite useful, BTW; it means you can go back and forth between pages without losing what you were writing in a form (unlike MSIE, where forms are reset).
RMN
~~~
And this is the highest irony...
That the poster did it as an AC... which means they get no karma.
Ooh. Double-dumbass.
Want to Know How to Cheat the GPL? Read On!
This almost makes me want to break some other rules and hack my TCP stack to send back some other amusing responses to unsynchronized packets - perhaps a ping of death or an invalid OOB packet (WinNuke)?
Hmmm. Deliberately breaking -- oh, I'm sorry, "rewriting" -- one of the core technologies of the Internet, without telling anyone and in such a way as to pad their speed numbers? Nah, nothing wrong about that...
The Mongrel Dogs Who Teach
i've ran Netscape 4.1 on my pentium 133 with a 28.8 kps modem, but web pages load instantly on my box. whats my secret??
Well i'll tell you !!
i upgraded to IE 6.0* and the web pages popped up instantly !! even the pop-ups where there just as quickly
using IE increased the speed of web browsing on the internet for me, it can for you too !
*Note: to run IE 6.0 i also had to upgrade to a more recent AMD XP system running Windows XP and a 1.5mbs Cable Modem service which had a 98% impact on page load and rendering times.
I follow the SDK and GDN principles.. Spelling Dont Kount, Grammer Dont Neither
A custom application we run at work makes use of the IE ftp client to make automated connects to our ftp server. Any other client, Linux or Windows, disconnects from the server on shutdown. IE or the IE-based ftp client don't, even if you exit IE. Because of this we've been forced to set a session idle timeout of 1 minute on the server to avoid hanging connections. Is this another example of the same technique, client-side?
Of course, when your target market is non-scalable toy computers, who cares if you software isn't scalable either.
The thing I don't understand... Isn't this somewhat like keepalive and pipelining?
I normally hate Microsoft, and think they are up to massive conspiracies. However, in this case, it seems more to me like a legitimate innnovation, as opposed to some elaborate scheme. I fail to understand what is 'evil' about this: isn't this a good thing?
________________________________________________
suwain_2
"One possible explanation is something that my team and I noticed a couple of years ago"
They had IE 3 a "couple" of years ago. This article is based on faulty data from two or three years ago, which the author even admits.
Maybe the editors should read the links in stories before posting the stories, it gives Slashdot a bad name to be posting articles like this.
IE's other trick, or so it is assumed (since the source isn't available) is that it does full DOM and JS caching.
That is to say, if you visit a webpage with (say) Mozilla, the HTML is interpreted and the HTML tree is built in memory. Pages with advanced CSS have a more complicated tree, of course. However, when the user leaves the page, that tree is destroyed and has to be recreated each time the user visits the page.
The bug to correct this in Mozilla is bug 38486 - "[FEATURE] Keep DOM and JS context in memory to provide fast access when clicking back". You can also vote for it (free Bugzilla account required) though you'll have to copy-n-paste the URL into your browser window since Bugzilla doesn't accept referrers from Slashdot.
PS Threaded e-mail is handy, eh? It sure is, unless your mail reader doesn't remember that you want to see your mailboxes in threaded view and keeps reverting back to collapsed form. That one is bug 64426 (vote for it if you like).
Alex Bischoff
HTML/CSS coder for hire
that cannot be... surely there'd be an endless amount of problems with stateful firewalls. not to mention that isa and msproxy server would have to support this.
are we sure that the author just doesn't understand persistant connections???
a simple netstat -a would show you if the connection was kept open... i'm using squid as my proxy so can't test this.
No matter how fast or how slow IE is, a lot of people are still stuck using it because there are just some sites that are Windows-centric. Some sites just don't work or looks like crap if you're using something else.
Speaks pretty poorly of the server (or network architecture) if your only recourse is to say "it's the client's fault!"
4069902 TCP in 2.5.1 should have similar slow start mechanism as in 2.6 13 Aug 1997
/dev/tcp tcp_slow_start_initial 2
) TCP BASICS - SLOW START AND DELAYED ACK
The TCP specification requires something known as "slow start". The
algorithm applies to the sender side and is described in RFC2001.
The intent of the slow start algorithm is to avoid a "congestion
collapse" in a network by ensuring that each TCP sender doesn't
overwhelm the network. The algorithm mandates that the first
transmission be a single packet. If the recipient acknowledges
the first packet successfully (i.e. the communication doesn't time
out and the recipient believes that the packet has arrived without
error), the sender sends two more packets. Successful transmission
results in the sender sending yet more packets in parallel, until
the capability of the underlying network is reached and one or more
packets are not acknowledged successfully. Essentially the sender
uses ACKs as a "clock" to regulate and gradually increase the
rate packets are injected into the network until it reaches an
equilibrium.
The TCP specification describes another technique known as
"delayed ACK", which concerns the receive side. The technique
calls for an acknowledgement of a data packet to be delayed for a
short period of time - the delayed-ACK interval. Different TCP
implementations use different delay intervals. The TCP specification
(RFC1122) mandates that the delayed-ACK interval must be less than
0.5 second. Delayed ACK serves to give the application an opportunity
to send an immediate response, in which case the ACK can be
piggyback'ed with the packet carrying the response. This technique
is very useful, both in saving the network bandwidth and in reducing
the protocol processing overhead, and is widely adopted by TCP
implementations. The TCP standard also recommends that an ACK not to
be delayed for more than two data packets. This is to keep the slow
start algorithm on the sender side flowing, which counts on the ACK
packets coming back from the receive side in order to strobe more
data packets into the network.
2) TCP SENDER/RECEIVER DEADLOCK - THE IDLE TIME
A simplistic implementation of delayed ACK can cause unnecessary
idle time during the initial data transfer phase in a client-server
network environment. The scenario is as follows. When a sender
request can't fit in one TCP packet, TCP will break it up into
multiple packets. During the initial slow start phase, the sender
is allowed to send only one packet. Therefore only a partial sender
request is sent. The receiver application, upon receiving the
data in the packet, is not able to respond because the data is
incomplete. In the mean time, the receiver TCP is holding back the
ACK, waiting for the second data packet to show up. But the sender
TCP is waiting for an ACK to come back before sending more data - a
temporary deadlock. Eventually, the receiver TCP will give up the
waiting after a delayed-ACK interval, and send back an ACK.
This interplay of a simplistic delayed-ACK implementation with
slow-start algorithm accounts for the idle time problem seen in a
number of WEB benchmarks. These benchmarks employ HTTP response
messages of at least 8KB and usually more. On a typical network,
this size of data requires more than one TCP packet to carry.
During the idle time, the client TCP holds back the acknowledgement
of the first packet while the client HTTP is waiting for the rest
of the response data from the server before it can issue the next
HTTP request. But the server is waiting for the client TCP to ACK
before it can send the rest of response data.
3) SOLARIS CLIENTS - NO DELAY ON INITIAL ACK
Only configurations with clients that use a simplistic delayed ACK
implementation, e.g. Windows/NT, will exhibit the idle time problem
when talking to a Solaris server. Configurations using Solaris
clients are not affected by this problem because Solaris uses a more
sophisticated delayed-ACK algorithm. It recognizes the initial data
transfer phase, and will not delay the acknowledgement of the first
data packet.
4) SLOW START BUG - NO MORE IDLE TIME
Configurations using a server running Windows/NT, or an OS with a
BSD derived TCP stack don't exhibit this idle time problem. This
is, rather ironically, due to a widespread bug in the slow start
implementation in both Windows/NT and BSD derived TCP stacks.
The bug in the server erroneously takes the last ACK in the TCP 3-way
connection handshake as an indication of a data packet successfully
going through the wire. Therefore, when the server is ready to send
back the first response, it is allowed to send TWO, instead of one
TCP packet. The client, upon receiving two packets, will ACK
immediately as suggested by the TCP specification.
5) BREAKING DEADLOCK - THE WORKAROUND
A new TCP tunable "tcp_slow_start_initial" has been added to the
Solaris 2.6 release. The default value is one (1), which gives the
same behavior as Solaris 2.x releases prior to 2.6, and is fully
compliant with the current TCP slow-start standard (RFC2001).
The amount of the extra delay described above depends on the
delayed-ACK interval of the client's TCP stack, and is usually on
the order of 200 milli-seconds. For a normal TCP connection, this
delay is hardly noticeable. Nevertheless, it may not be true in an
environment that employs many short-lived connections, or connections
transmitting only a small amount of data. A good example is a WEB
server. In those environments, one should consider changing
"tcp_slow_start_initial" from the default value of one (1) to two (2).
The potential downside of the change is that, with many clients all
starting at two packets instead of one, more network congestion
might be introduced. IETF (Internet Engineering Task Force, the
industry group that governs the Internet standards), after recognizing
the problem described here and the widespread of the slow start bug
described in 4) only recently, conducted a preliminary study over the
global Internet on the effect of amending the slow start algorithm
to start at two packets instead of one. The study found no evidence
that the change caused more congestions. It's still conceivable,
although rare, that on a configuration that supports many clients on
very slow-links, the change might induce more network congestions.
Therefore the change of "tcp_slow_start_initial" should be made with
caution.
Sun is actively participating in an effort in IETF to revise TCP
specification to allow more packets to be sent initially. Once the
revision is ratified, Sun will take the appropriate actions to
upgrade Solaris TCP accordingly.
6) COMMANDS FOR THE WORKAROUND (Solaris 2.6 only)
> su to root
> ndd -set
See ndd(1M) for an explanation of the tuning facility.
What's wrong with that? Mozilla didn't think of it first.
Err, I don't think so. From what I've read about HTTP KeepAlive, the connection should be kept alive by adding a "Connection: KeepAlive" header to the request or something like that. I can't imagine any reason why any protocol should want to interfere with the TCP handshaking sequence for keepalive purposes. That would mean crossing out of the application layer into the transport layer.
... It'd just return "Page could not be loaded" or something like that. The problem never cropped up in Mozilla or other browsers, and eventually I found out that if I added this line:
This issue caused me a lot of grief last year, and I am just figuring out why. We set up a webmail server using Apache/Vhosts and OpenSSL, and we had this recurring problem of links just suddenly breaking in IE
SetEnvIf User-Agent ".*MSIE.*" nokeepalive ssl-unclean-shutdown
to the virtual host configuration, the problem went away. Now that I've read this article, I think I understand why. What I think is happening here is that Microsoft trying to make the most out of keepalive/persistent connections by bending the rules. And it's not right.
Am I a hipster-doofus?
The way I understood it was there's 2 forms of communication going on between the client and server. For simplicity, I'll use an analogy.
It's sort of like making a telephone call in 1 of two ways:
The first way - Call a friend on the phone, and have an entire conversation, but never do the formality of a "Hello" or "Bye" at the beginning or end of the call and don't hang up even if you've run out of things to say.
The second way - Call a friend on the phone, but ring them individually for each and every word of the entire conversation, and be sure to include the formality of "Hello" and "Bye" with each and every call.
Maybe I have a wierd way of reading this, but that's what I got out of it.
__________________________________
Free your mind - Flush your toilet
Well, because IE leaves server-side connections open, it would make things much more difficult for the server-end, no matter if you run IIS or not. So, it can basically be considered a low-level DoS attack on all non-IIS servers.
Wouldn't you be upset if IE pre-cached all the links in a page, just so users would have a bit of a speed boost? If they wanted IE/IIS to be faster when speaking with each other, why no have them communicate on a different port, instead of casuing problems, and slow-downs on non-IIS servers? Hey, they could use port 80, UDP... That would be faster, and since non-IIS servers won't be using UDP/80, it won't be incorrectly leaving connections open, sending invalid packets, slowing down communications with non-IIS servers, etc.
It's not that they sped things up, it's that they did it in such a way that it causes minor problems for servers that don't use IIS. Sound a little like the Microsoft Java fiasco a little while back? Leveraging their desktop monopoly to sell IIS...
So, this is a overly bad idea, and there are a thousand ways they could have done this better, while not causing problems for non-Microsoft products.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
Even if the Mozilla team did come up with this idea, it would never be implemented. Why? Standards compliance. That's been their goal from the beginning - they would never break a standard, especially as fundamental as TCP/IP.
This just goes to show the differences between Microsoft and many open source projects. Microsoft didn't care at all about the impacts of this decision - as long as it makes IE and IIS look faster, it's in. However, Mozilla/Apache/etc. aren't willing to sell out.
Sounds to me like this blog is describing pipelinging which is a standard part of HTTP 1.1...
What is HTTP pipelining?
Normally, HTTP requests are issued sequentially, with the next request being issued only after the response to the current request has been completely received. Depending on network latencies and bandwidth limitations, this can result in a significant delay before the next request is seen by the server.
HTTP/1.1 allows multiple HTTP requests to be written out to a socket together without waiting for the corresponding responses. The requestor then waits for the responses to arrive in the order in which they were requested. The act of pipelining the requests can result in a dramatic improvement in page loading times, especially over high latency connections.
Pipelining can also dramatically reduce the number of TCP/IP packets. With a typical MSS (maximum segment size) of 512 bytes, it is possible to pack several HTTP requests into one TCP/IP packet. Reducing the number of packets required to load a page benefits the internet as a whole, as fewer packets naturally reduces the burden on IP routers and networks.
HTTP/1.1 conforming servers are required to support pipelining. This does not mean that servers are required to pipeline responses, but that they are required to not fail if a client chooses to pipeline requests. This obviously has the potential to introduce a new category of evangelism bugs, since no other popular web browsers implement pipelining.
When should we pipeline requests?
Only idempotent requests can be pipelined, such as GET and HEAD requests. POST and PUT requests should not be pipelined. We also should not pipeline requests on a new connection, since it has not yet been determined if the origin server (or proxy) supports HTTP/1.1. Hence, pipelining can only be done when reusing an existing keep-alive connection.
How many requests should be pipelined?
Well, pipelining many requests can be costly if the connection closes prematurely because we would have wasted time writing requests to the network, only to have to repeat them on a new connection. Moreover, a longer pipeline can actually cause user-perceived delays if earlier requests take a long time to complete. The HTTP/1.1 spec does not provide any guidelines on the ideal number of requests to pipeline. It does, however, suggest a limit of no more than 2 keep-alive connections per server. Clearly, it depends on the application. A web browser probably doesn't want a very long pipeline for the reasons mentioned above. 2 may be an appropriate value, but this remains to be tested.
What happens if a request is canceled?
If a request is canceled, does this mean that the entire pipeline is canceled? Or, does it mean that the response for the canceled request should simply be discarded, so as not to be forced to repeat the other requests belonging to the pipeline? The answer depends on several factors, including the size of the portion of the response for the canceled request that has not been received. A naive approach may be to simply cancel the pipeline and re-issue all requests. This can only be done because the requests are idempotent. This naive approach may also make good sense since the requests being pipelined likely belong to the same load group (page) being canceled.
What happens if a connection fails?
If a connection fails or is dropped by the server partway into downloading a pipelined response, the web browser must be capable of restarting the lost requests. This case could be naively handled equivalently to the cancelation case discussed above.
The1Genius - Littera Scripta Manet
So what you're saying is that MSIE is responsible for a lot of the /. effect? It seems that all of those windows-using /. readers might think once or twice about their OR or browser if they know that they're ruining the Internet for everyone else.
As the owner and operator of a small commercial web hosting outfit I wholeheatedly agree. Just two days ago one of my clients' sites got slashdotted.
/. traffic accounts for less than 1% of my servers' total traffic, it just happens to happen over a short period of time. It is not economical for me to have 99% idle bandwidth for the 0.01% of the time that it is needed. Also, you trolls aren't paying for it, I am.
It is extremely annoying to see posts about poor server configuration from the losers who post here. The server is seldom the issue, the bandwidth is. My server gets slashdotted about once a month and every time the server load is nominal, yet my two T1s get crushed. Of course I surcharge my clients responsible for this as it creates problems for the rest of my clients.
Some responsible behavior on the part of Slashdot editors/administrators is in order. It doesn't take a genius to figure out which sites may survive a slashdotting and which may not. When in doubt, ask.
As for the trolls that whine like little bitches about lack of bandwidth,
You obviously have no clue about networking. keep-alives are implemented at a MUCH higher level, using a keep=alive header to keep the connection open.
The sequences described here are low level packet tweaks which are not RFC compliant at all. They leave connections in a half closed state in case another non RFC compliant request comes in.
SO what happens? It makes IE requests complete faster on IIS, but non IE requests slower due to an extra handshake due to the connection being half closed.
Top Most Bizarre/Disturbing Error Messages
... and /. is renowned for getting news to it's readers in a timely fashion, so this would be intolerable.
Which is a standard What is everyone complaining about?
Of course not. Microsoft would never do such a thing. I scoff at your ridiculous suggestion, good sir.
You might want to look into HTTP 1.1 as well. In fact, so should Microsoft, because (if the article is accurate) they've apparently re-invented the wheel in square form.
Standard HTTP 1.1 keepalive still uses a regular, plain, vanilla TCP connection. No FIN packets until the connection actually is finished. It simply doesn't close the connection, allowing further requests on the same connection (because the connection is still open). The connection is closed - using the standard methods - when one side decides to close the connection (eg. after a timeout).
What is described in the article is a bastard half-closed connection, which is completely unnecessary unless your goal is gratuitous violation of the TCP spec.
Comment removed based on user account deletion
The blog describes the full HTTP transaction process as:
Which IE (allegedly) "hacks" and the transaction really goes like:
If this is true, then IE saves 2 round trips per connection. Clients generally open 4 connections per server, and keep them open (alive) until they've downloaded the page and all supporting files. So IE possibly saves 8 round trips per page with this (alleged) hack.
For domestic dialup connections, the average round-trip latency is 60ms. DSL is around 40, while cable is around 20. Ping slashdot.org to find out the latency of your connection.
So, for a domestic dialup user connecting to an IIS server, a straight request (with no handshake) would save 8*0.06s = 0.48s. The page mentions combining SYN/ACK packets, so this may even be less of a savings.
An 0.48s cheat in page load times hardly makes IE "impossibly fast" when page load times over a modem typically run > 20s.
Also, don't forget that this blog also talks about non-IIS servers balking at this non-standard connection setup with with an RST packet. That adds 4*0.06 = 0.24s to page load times on, say, Apache servers. If true, that doesn't make IE "ridiculously slow," either.
It all goes downhill from first post
Read the article closly. A request from IE to a non-ms server will take longer than a request from a normal browser using compliant TCP to the same server. This not only gives IE a speed advantage with IIS, it makes non-ms servers appear slower than they actually are when you use IE! The only speed advantage is with IE and IIS. As I remember, this sounds like part of the antitrust case against microsoft.
Does IE have a custom TCP layer in it?
You're kidding right? IE is not some stadn alone program. It has MANY links into low level microsoft stuff where it is 'part' of the WIndows OS. This was the whole arguement of M$'s lawyers, that IE couldn't be removed easily.
So it wouldnt' surprise me if IE had access to some special stack API to pull stuff like this. Would not surprise me at all.
Top Most Bizarre/Disturbing Error Messages
The parent +5 post is flat out wrong. This is not about persistant connections, which is a high-level HTTP feature that keeps a connection open so that the browser can send more requests. This is about a low-level TCP hack that IE uses to get a small speed boost on IIS servers, while breaking TCP standards compliance.
If I read the article correctly, instead of creating a new TCP connection and then sending a request, IE sends the request immediately without bothering to finish the TCP handshake. Microsoft IIS web servers deal with it automatically, and it is faster because it saves a round-trip wait for the ACK and the following requset.
The down side is that non-IIS servers have no clue what this incoming packet is. It must be invalid because it is not a SYN. So it gets thrown away, and the server might or might not reset the connection. If a non-IIS server resets the connection, IE goes with a standard TCP handshake and has wasted only the round trip time for the request packet and the RST. But if the server swallows the invalid packet and does not send a RST, then Internet Explorer will just sit around for a few seconds until it times out and falls back to a standard TCP conection.
The summary is that IE is breaking the TCP protocol for a small speed boost when connecting to IIS servers. It results in a small speed penalty when connecting to most non-IIS servers. When connecting to non-IIS servers that do not reset the connetion, it results in a very noticable delay.
It could also be a potential security risk, because if this is true, then it makes it very easy to IP-spoof a HTTP request against IIS (since the request is a self-contained packet instead of a long connection sequence).
Here's a tcpdump for www.microsoft.com, on an XP box:
03:47:16.259661 10.0.0.52.1328 > www.us.microsoft.com.http: S 2485226999:2485226 999(0) win 16384 (DF)
03:47:16.279661 www.us.microsoft.com.http > 10.0.0.52.1328: S 631604626:63160462 6(0) ack 2485227000 win 65535 (DF)
03:47:16.289661 10.0.0.52.1328 > www.us.microsoft.com.http: . ack 1 win 17520 (D F)
03:47:16.289661 10.0.0.52.1328 > www.us.microsoft.com.http: P 1:398(397) ack 1 w in 17520 (DF)
03:47:16.339661 www.us.microsoft.com.http > 10.0.0.52.1328: . ack 398 win 65139
And here's for www.msn.com:
03:50:22.169661 10.0.0.52.1397 > www.msn.com.http: S 2535664221:2535664221(0) wi n 16384 (DF)
03:50:22.199661 www.msn.com.http > 10.0.0.52.1397: S 3601141750:3601141750(0) ac k 2535664222 win 65535 (DF)
03:50:22.209661 10.0.0.52.1397 > www.msn.com.http: . ack 1 win 17520 (DF) 03:50:22.209661 10.0.0.52.1397 > www.msn.com.http: P 1:391(390) ack 1 win 17520 (DF)
03:50:22.269661 www.msn.com.http > 10.0.0.52.1397: . ack 391 win 65146
These look like perfectly valid TCP handshakes. I did notice that when refreshing a site, IE reuses the previous connection, but that's legal (assuming it used Connection: KeepAlive in the HTTP header. I didn't verify that.)
The samples were taken on my network's gateway, which is a Linux box, hence impartial :)
But don't take my word for it. Try it yourself!
The IIS team probably noticed this and just accepted the command even though there wasn't actually a valid TCP connection present. So if they receive a packet that looks enough like a HTTP request then do it. There's probably a stack of vulnerabilities here.
The interesting point is that IE and IIS must be using the network stack at a layer lower than the BSD style socket calls otherwise these packets would be rejected at the OS level and no, I don't believe Windows' networking stack is that crap. TCP processing is fiddly so cue more security holes.
This is also an easy in to hurt IE performance. Rather than responding to the dud packet with a RST, don't respond at all (which according to the article is an acceptable response). I'm not sure how linux handles this atm. The end result is IE is dog slow to start loading the page but every other browser is super quick.
And to all those people who posted saying this is HTTP pipelining, please don't talk about networking, ever. You lack a basic understanding of how network protocols are layer upon each other. It would be better if you just rub your chin and nod sagely, possibly saying "hmmmm" at the same time. That way you wont look so stupid.
Nerd: Derogatory term typically directed at anybody with a lower Slashdot ID than you.
It is preposterous to expect slashdot to be responsible for linking to someone else's site. By putting content on the WWW, you are explicitly allowing others to visit your site.
The site operators are the ones who are liable for their own content and their own bandwidth usage. If they don't want more than a certain number of people visiting their site, they should tweak their web server accordingly. Not everyone has bandwidth that is metered.
just my 2 cents.
I'd rather be a conservative nutjob than a liberal with no nuts and no job.
Perhaps it's because for 60%+ of the servers out there, it actually makes things slower and for 100% of the servers, it makes it less reliable.
As has been said countless times already, no. This is a violation of the TCP standard. Pipelining works within the HTTP standard, and part of that is keeping the connection open using standard TCP signalling technology, which this is definitely not.
Got time? Spend some of it coding or testing
Persistent connections work through the HTTP protocol layer over standard TCP, this is a violation of a much lower TCP protocol layer instead.
Got time? Spend some of it coding or testing
It is being set up properly. What happens is that the browser hasn't closed it's half of the connection. When the next request happens it tries a TCP write, but since the server side has closed the connection the write fails. That's what's confusing the blog author, they're not familiar with the TCP protocol. A TCP connection has two halves and it's entirely legal to close one half but not the other, leaving a socket that can be read from but not written to (or vice versa). IE doesn't check for the server-side close like it should, treats the socket as if it's writable (which it is) and writes to it. Since the server's closed the socket on it's end, that attempted write generates an RST (which is TCPly correct), the browser gets a write error and finally notices that it's connection has been closed by the remote end, closes everything down like it should have much earlier and builds a completely new socket.
You can get this same behavior between two Linux systems. The server side goes:
- socket(...)
- listen(...)
- accept(...)
- read(...)
- write(...)
- shutdown( SHUT_RDWR )
- close()
The client side goes:- socket(...)
- connect(...)
- write(...)
- read(...)
- write(...)
- Note error
- close(...)
- socket(...)
- connect(...)
In IE, steps 3 and 4 in the client handle one request. Step 5 is an attempt to handle the next request assuming that the server handles persistent connections. Step 6 is where IE notices that the server doesn't do persistent connections.The right thing to do would be to notice the HTTP version and lack of a Connection: header indicating support for persistent connections in the response and close the connection upon receipt of the response. IE is stupid in not handling non-persistent-connection servers as it should, but it's not violating or even bending the TCP protocol spec in any way. It's just stupid coding.
This is more than just a persistant connections. What IE is doing is sending the request for the page *before* any sending SYN or ACK packets that every TCP/IP application is supposed to send.
.. note: the connect request does not return until it has been acknowledged.
... where the WSAConnect call has data it will send with the connect packet inside it.
I think you'll find that the request is sent with the connection request, and is perfectly legal TCP/IP.
The thing is, BSD sockets doesn't let you easily do this.
Most Unix apps do this:
SOCKET s = socket(...);
s.connect(...);
s.send(...);
What IE is doing is this:
SOCKET s = socket(...);
WSAConnect(...);
This is ALL perfectly valid TCP. Remember; the flags in the packet are what determine how to handle the incoming packets; the data is handled separately. You can quite happily send data with your connect request, as long as you're willing to accept that the request may well fail.
Simon
Coming soon - pyrogyra
it's documented here: "Object Moved Error"
something I can run on my apache server that rejects clients that don not follow the rfc for tcp/ip, and hence rejects ie
I've toyed with blocking based on agent string, but that seems cruel and stoops to the level of MS...(who do this regualrly) and besides, it goes against my beliefs of software choice... however, it would be nice to redirect peopel to a page that says, "Your browser is not standards-compliant"
Whatever. Y'all seem to like it when *I* screw with TCP :-)
Yours Truly,
Dan Kaminsky
DoxPara Research
http://www.doxpara.com
1) send FIN
2) wait for ACK
3) wait for FIN
4) send an ACK
if the server never receives the FIN in step 3, it assumes that the client wants to keep the connection open for some reason. this is _correct behaviour_ with regards to the TCP spec. if this article is correct, MS is merely exploiting the TCP spec to its advantage. yes, it's dirty and wastes resources, but it works.
the thing that bothers me tho, is this is what should be happening on the server end (a non-IIS server, that is):
1) send FIN
2) wait for ACK
3) ok, got ACK, now wait for FIN
4) (after timeout) hmm, no FIN, must have been lost, so we'll resend our FIN
5) client ACKs that FIN, but doesn't send its FIN
6) server thinks the response FIN is lost again, so probably resends its FIN
now the server will have a max amount of retries before it gives up and finally drops the connection (which is what it was trying for in the first place anyway). this should be a relatively low number, and the timeouts between each retransmission shouldn't be that long either. so unless IE comes back and requests another page fairly quickly, the server _should_ go ahead and drop the connection, so i fail to see how this is a problem.
the only thing i can think of is that the client keeps responding with an ACK to the server's FINs (despite not sending its own FIN), so maybe the server won't drop the connection for that reason (since the client is obviously still alive, just not responding as expected). i don't remember the TCP spec all that clearly with regards to connection teardown, so that may be where IE is able to keep the connection open.
then again, i could be totally wrong here, but i don't think so...
Xfce: Lighter than some, heavier than others. Just right.
I then fired up Windows XP Pro. XP sends lots of netbios stuff at startup and periodically. Very interesting. But again, nothing nearly as interesting as this article suggests. MSIE 6.0.2600.0000... also did not reproduce this non-RFC behavior.
Here is the packet log from tcpdump, with some comments. 192.168.194.211 is the Windows XP client. 192.168.194.1 is the nameserver, and 66.218.71.83 is the web server (www.yahoo.com).
First, XP asks the nameserver for the IP number of www.yahoo.com
15:19:50.426473 192.168.194.211.1026 > 192.168.194.1.domain: 2+ A? www.yahoo.com. (31)
The nameserver responds
15:19:50.702603 192.168.194.1.domain > 192.168.194.211.1026: 2 10/11/0 CNAME[|domain] (DF)
XP/MSIE sends a normal SYN packet. There is no non-RFC packet transmitted before this standard SYN packet, corresponding to an already-open connection before this as the article claims.
15:19:50.734980 192.168.194.211.1032 > 66.218.71.83.http: S 3861657940:3861657940(0) win 16384 <mss 1460,nop,nop,sackOK> (DF)
Yahoo responds with a normal SYN
15:19:50.797377 66.218.71.83.http > 192.168.194.211.1032: S 3674114276:3674114276(0) ack 3861657941 win 65535 <mss 1460> (DF)
XP/MSIE sends a normal ACK to finish the connection setup
15:19:50.802506 192.168.194.211.1032 > 66.218.71.83.http: . ack 1 win 17520 (DF)
XP/MSIE sends the HTTP request (196 bytes)
15:19:50.809064 192.168.194.211.1032 > 66.218.71.83.http: P 1:197(196) ack 1 win 17520 (DF)
Yahoo responds with the first 1460 bytes of data
15:19:50.907564 66.218.71.83.http > 192.168.194.211.1032: . 1:1461(1460) ack 197 win 65535 (DF)
XP/MSIE acks it
15:19:50.919180 192.168.194.211.1032 > 66.218.71.83.http: . ack 2921 win 17520 (DF)
Yahoo responds with another 1460 bytes
15:19:50.923751 66.218.71.83.http > 192.168.194.211.1032: . 2921:4381(1460) ack 197 win 65535 (DF)
XP/MSIE acks it
15:19:50.941174 192.168.194.211.1032 > 66.218.71.83.http: . ack 4381 win 17520 (DF)
Yahoo responds with two more packets
15:19:50.999791 66.218.71.83.http > 192.168.194.211.1032: . 4381:5841(1460) ack 197 win 65535 (DF)
15:19:51.007961 66.218.71.83.http > 192.168.194.211.1032: . 5841:7301(1460) ack 197 win 65535 (DF)
XP/MSIE acks that it has received up to 7301. Notice how Microsoft is properly delaying the ack until a second packet is received.
15:19:51.013652 192.168.194.211.1032 > 66.218.71.83.http: . ack 7301 win 17520 (DF)
So there are two tests, with the MSIE shipped (unpatched) with Windows 98 SE and Windows XP Pro. It looks like there just isn't a story here.
PJRC: Electronic Projects, 8051 Microcontroller Tools
# HTML support
# URI parsing that's RFC-2396 compliant
# Cookies support, RFC-2965 compliant
# XHTML 1.0 rendering
# Plain text rendering
# Image formats support: PNG, JPEG and GIF (no animated GIFs)
# HTTP 1.x Compliance
RFC / W3C STANDARDSYou keep going until you die..."Me".
What is much more interesting is what IE does AFTER it sends that first request without opening the connection... You know the lovely MSN Search page it loves to pop up? Everytime IE encounters (for the first time in each session) a non-IIS server, it promptly connects to MSN Search and submits the website address....
You are being watched, friends.
Cool! Amazing Toys.
Is there anyone out there who really expected their *handshake* to mean something?
Since when was there anyone else... that mattered? :)
Majority rules baby. Live with it or do something to change it.
No. The RFC is the framework. The Internet is a system of systems that can be interlinked because they communicate according to a well-established, public, open framework. No company should be in the "business" in mucking with that. If Microsoft discovered a failing in the RFC -- which, by all appearances, they did not -- then there exists a well-established, well-understood path to fixing the framework.
Here, Microsoft has decided, arrogantly IMHO, that their tiny bit of speed enhancement is worth making the TCP connection less reliable. At the very least this wastes bandwidth; it might also waste human resources as people try to track down a "glitch" in their system that simply isn't there. If Microsoft found that the RFC is "broken", why didn't they tell anyone? Why didn't they try to help "fix" it?
No, this is the same penny-ante, half-assed crap that spawned Windows: Let's use all these undocumented tricks to make our software look better. Standards be damned. Interoperability be damned. Our customers be damned, and by God, the greater public be damned.
For all their talk of information "ecosystems", Microsoft still conducts themselves like a classic slash-and-burn outfit.
The Mongrel Dogs Who Teach
When trying to connect to an address of form 1.2.3.4, the program would halt for some twenty-thirty seconds before proceeding.
IIRC this was a problem with Sun's implementation of InetAddress.getByName() on Windows. When passed a string containing a dotted numeric quad, it stupidly tried to do a DNS lookup on it instead of simply filling in the four bytes and handing you an InetAddress. Because who knows- maybe someone registered "192.168.1.23" as a domain name! (Which would be akin to registering "microsoft.com" with a Cyrillic "o", but never mind.) Then of course your thread stalled inside InetAddress for half a minute while it waited for the DNS timeout. This makes me suspect that Sun's code was waiting for the successful DNS response and ignoring the failure response that actually arrived. Probably the same moron was responsible for both bugs. Editing the hosts file became the standard workaround.
I don't know when it got fixed but there's code in there to check for a dotted quad now.
Whoever wrote this and his 'team' are tards. What they were seeing was a keep-alive (persistent) connection, or a persistent connection...it's total BS that IE would ever send a request to a host without a connection already being open. IIS just allows for persistent connections...when you hit blah.com, you open the sock, send your request and all and specify keep-alive. Now, the socket just stays open, so when they hit another page on the same host, they send a request to the already-open socket without the initial 3-way handshake since they've already done that. If it was true that IIS allowed IE to get a page without a 3-way handshake first (not that the Windows TCP/IP stack would even _allow_ that packet to get through because it's based off of the BSD TCP/IP stack, and a 3-way handshake _must_ be done before any data can get to a user-land socket..and not like any NATed routers would let it through, either), it would allow total TCP hijacking and DoS's But it's always nice to see that people who don't know jack are able to post stuff to slashdot ;o
You really don't want to do that. HTTP over UDP is simply a bad idea... Why? In order to meet the most basic needs of a stream reliant protocol (ala HTTP) you need a few things:
1. Reordering (No guarantee packets arrive in order)
2. Retransmiting (Detect lost packets and resend)
3. Speed throttling (Packets go too fast -> router interface buffers overflow -> packet dropped)
It is of course possible to write a protocol on top of UDP to do these things, but thankfully we don't have to as someone did the work for us... it is called TCP.
I suppose if you wanted to use a HTTP/UDP mechanism for very short communication only (read: one request packet, one reply packet) then those issues aren't relevant, but otherwise leave the heavy lifting to TCP or other stream based IP protocols.
The 4 steps when the server finishes sending data and closes the connection, from the article:
Client Server
<-- FIN
ACK -->
FIN -->
<-- ACK
When the server has no more data, it sends FIN.
The server should not be allowed to send more data after the FIN! This is a violation of the TCP spec. Otherwise, how would clients truly know whether or not the server had more data to send?
TCP does support something called "half close". It is possible to indicate that you have no more data to send, but that you are still willing to receive data. This is why both sides must send FIN, in order to cleanly close the connection. If one side sends FIN but the other doesn't, the connection remains open, but data can only flow in one direction (sending from the side that did not send the FIN). This is useful for cleanly shutting down connections and making sure that both sides receive all the data they were expecting.
In the example from the article, when the client receives a FIN but does not send a FIN of its own, this is legal: the TCP connection now is one way, and data can only be sent from the client to the server. The server is not allowed to send more data. So, if IIS is doing this, it is breaking the spec. It is important to note that the client is doing nothing wrong in this case.
Dr. Demento On The 'Net!
Mark this down on the calanders everyone...
Karma whorin' since 1999
700+ comments, 95% of which are:
- MS sucks for breaking RFC's
- Apache should do something about it
- Users of IE are clueless morons.
All of this because some blogger can't read a packet trace correctly. Everyone in the thread who's actually TRIED it (the other 5%) hasn't seen this behavior.
There's no way anything's going to work if IE doesn't send a SYN. Nothing, Nada, Zip. It just won't happen. Firewalls, NAT, transparent proxies would kill it. IIS isn't going to care, the TCP/IP stack won't even let it get there. Same goes for Apache. Get THE book on TCP/IP and find out why.
I think this thread is a prime example of what Slashdot has become. Never mind news for nerds (definition not limited to the Linux crowd) and stuff that matters. We'll post anything as long as it's anti-MS.
No sig, sorry.
I don't know where you get those "95% of which are blabla" from, but I see 800+ comments:
- 70% "the article is fake!" or "I tested it but IE use standard TCP requests. fuck you anti-Microsoft Linux zealots!"
- 10% "MS sucks"
- 10% junk/flamebait/trolls/crapfloods
Sorry, but your claims are completely false. Slashdot is everything but anti-MS. Why do you think your post is modded as +5 Insightful?
That Slashdot is an anti-MS site is simply false. People have been saying how Slashdot is anti-MS for centuries but every time I browse through the comments, there are always lots and lots of pro-MS comments, a lot of them are even modded +3/+4/+5.