The 2.4.x Kernel, ECN And Problem Websites
mitd writes: "Enterprise Linux Today is running an article about how some network devices i.e. routers, do not support ECN (Explicit Congestion Notification), causing WWW sites to be unavailable to 2.4.x kernel based hosts." The article does show you an easy workaround, though. (Read more below.)
"Nice quote: 'The answer is that Linux is once again on the cutting edge of networking technology ...' The article points out some major sites that have not updated their routers to handle ECN packets."
Anything that helps destroy congestion at least has my attention. (And in a parallel universe, legions of Windows users are howling that the Linux hegemonists have again chosen to implement new standards in order to drag them into the fold ;) )
If you find ECN enabled in your distributor's 2.4.x kernel package by default, please consider this a severe mistake on your distributor's part. Please do not consider it a bug in "the 2.4.x kernel". The author of the Enterprise Linux Today article owes Linus and the kernel developers a retraction and correction.
IMHO, the kernel needs a standard on this. Should a network protocol be on or off, at boot time?
My next thought is that ECN is a Good Thing(tm) for these low-grade routers and firewalls. Either people upgrade (and thus remove security holes), or they lose sales, because nobody can reach them.
IMHO, someone needs to write an ECN module for Wintoes, to exploit this potential force for a quality Internet.
We =do= want a quality Internet... ...right??
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Is something as useful as ECN presentable as a mandatory update for infrastructure providers ( i.e. Cisco) or mearly a nice addition to be added when other software changes/updates are applied to routers and major servers?
It seems that it could have major benefits in improving response times, but only if compliance was the rule rather than the exception. What other OSes currently support ECN? Anyone know? I haven't found much info yet.
Kindness is the language which the deaf can hear and the blind can see. - Mark Twain
Somebody please mod this one up - clearly a lot of people think 'RFC = Standard', when the ECN RFC is clearly Experimental and explicitly not meant for production usage...
Currently there is *absolutely no practical benefit* in setting ECN bits in Internet packets today, because you need ECN capable routers throughout a network (or at least at bottleneck points) for ECN to be useful.
ECN is intended to work like this:
- ECN-capable host sends packets, setting the ECN-Capable bit in the IP Header's TOS byte to 1 so that routers know ECN is worth using
- packet experiences congestion in a router somewhere, i.e. router queue is filling up but not yet full
- router, rather than dropping the packet (which it could do, see WRED), chooses to forward the packet but mark it as 'congestion experienced' using a spare bit in the TOS byte of the IP header.
- host senses that congestion was experienced and does something about it - essentially the same as if the packet was dropped (e.g. TCP will halve its window size) but with the benefit of being able to process the packet rather than having to wait.
The end result should be quicker adaptation to congestion conditions, by avoiding some timeouts and retransmissions.
ECN is an interesting technique, but it will take a long time for it to be tested and debugged in realistic conditions, and for people to deploy it widely (perhaps in a modified version that is Standards Track within the IETF). Some routers, particularly routers in the core of the Internet, may never use ECN, since dropping packets is easier than modifying one bit.
Turning on ECN now will at least mean that some firewalls won't drop packets with ECN bits set, which is probably a good thing, but it's only going to help the ECN researchers in practical terms.
Maybe RFC 1812 - "4.2.2.6 Unrecognized Header Options: RFC 791 Section 3.1 A router MUST ignore IP options which it does not recognize." (caps emphasis theirs, not mine)
try { do() || do_not(); } catch (JediException err) { yoda(err); }
interesting. except for burstnet, the register (both running linux) and e3expo (nt) all the sites in that list run solaris.
and i'd bet that burstnet & the register use some sort of linux load-balancer, skewing the results.
---
Binary-only modules really aren't supported, you're not going to hear much crying on linux-kernel if they don't work. If you really-really-really cannot distribute modules precompiled for the major stock kernels (stock RH, Mandrake, Debian, SuSE, Caldera) or source then you can always do what 4-Front does, use a small shim that can be distributed as source. Recompile the shim on the target machine and voila! Linux will always be source compatable throughout a stable release, and that is what matters most.
-- Remember: Wherever you go, there you are!
And if you would finish the story you would know that the vger admin turned off the DUL when he learned that it was causing problems. Case closed.
-- Remember: Wherever you go, there you are!
Way to go. You tell 'em!
Perhaps one way to describe the situation succinctly would be:
The problem is network devices that don't implement ECN and fail to act passively with regard to the formerly reserved bit now used for ECN.
now we need to go OSS in diesel cars
According to this message on linux-kernel , David S. Miller plans upgrade vger.kernel.org, the linux-kernel mailing list server, Real Soon Now. This will prevent users behind routers that don't understand ECN from using the linux-kernel mailing list!
Is this irresposible or just a good incentive for the entire internet to upgrade their routers?
cpeterso
Please refer to the bold, red warning prefacing the linux-kernel mailing list FAQ:
Hot off the Presses:
On 22-FEB-2001, vger.kernel.org will enable ECN. You may need to switch ISP in order to receive linux-kernel email. See the section on ECN for more details.
On 25-JAN-2001, David Miller announced that vger.kernel.org will enable ECN in 4 weeks time. This means if your email account is with an ISP which has a buggy router, you will no longer be able to receive linux-kernel mail (as well as other mailing lists hosted on vger). You should check if your ISP is ECN tolerant, and get them to fix their routers or switch to another ISP.
Of course, these are the same people that use the MAPS DUL to block dial-up modem users from posting to the linux-kernel mailing list. Rik van Riel threw a temper tantrum, saying the DUL was class prejudice based on internet connection and that "DUL is an unethical list to use because it assumes guilty by default. Anyway, since linux-kernel has chosen to not receive email from me I won't bother answering VM bugreports or anything here." Alan Cox quickly replied, Thats ok. Andrea will I am sure be happy to take over as maintainer [of the VM subsystem]."
cpeterso
Before opening your pie hole, read the RFCs. Only broken routers who DO NOT OBEY the RFCs fail to pass ECN.
but if any Linux admins working for me were upgrading production servers to each new kernel 'just because it was available', they'd get some lecturing. You upgrade production boxes when you NEED to. ie: A security patch...
It only takes moments to skim the kernel changelog for each new version.
Also, as I've said before, why on earth would you turn on something like ECN not knowing what it was? And the help file for ECN *DOES* say specifically that it will cause problems on the internet, because many routers don't support it yet.
This has nothing to do with instability. The kernel is very stable; this has to do with people using things without doing the research.
The reason a new 'version' isn't released once or twice a year only? OPEN SOURCE. Whenever there are a reasonable number of bug fixes, a new version comes out.
Whether ECN is experimental or not, *standards* dictate that the bits in use should be simply passed through by other routers. If a router doesn't understand certain option bits, it's supposed to IGNORE them. It is routers NOT following this *long-standing* standard that are causing the problem.
I find it strange. In moving to 2.4 kernels, the first thing I did was, of course, run through the configuration.
For each option that I didn't recognize, I hit the help button. The help button for ECN (which defaults to off) specifically states that ECN is not supported by some routers, and currently may cause problems with reaching websites on the Internet, so I left it off.
So my question is: Why would you turn on a new network option without knowing what it was?
Unused bits in packets, be it IP or another protocol, could be used for a subliminal channel. So your statement that they should always left alone isn't always true. The paranoid among us should always clear them.
That said, most of the time you're probably right most of the time. Why fiddle with them when they're of no concern to you?
----------------------------------------------
the pun is mightier than the sword
Or the firewall manufacturer could be forward-thinking, realise that someday someone might have a useful reason to set that bit, and reject the packet, probably by sending a RSET with ECN unset. That way the experimental host can be notified of the problem, and can try again without ECN if it chooses.
I have no disagreement with firewalls being paranoid. I do disagree with firewalls dropping these packets silently. Especially seeing as upgrades fixing the problem have been available since mid-2000, according to here.
-Spiv.
Actually, ECN is designed to be backwards compatible - if a host doesn't understand ECN, it should respond with a packet with the ECN bit turned off, and the ECN-aware originating host will behave accordingly.
The problem is routers that drop these packets silently. They should either let them through, or if paranoid, reject them, sending a RST back to the original host, which can then retry without ECN. Dropping silently just makes the connection attempt "hang", until it times out.
Further, it is *not* enabled by default, can be toggled at runtime via /proc/sys/net/ipv4/tcp_ecn and comes with warnings in the appropriate build option. I'd say that's perfectly responsible way to introduce a new feature.
-Spiv.
Also, please note that using DUL generally does not block dial-up users: it forces them to use the ISP's server as a relay, as it should be.
It is highly debatable if forcing the use of a third party relay is a good thing or not. My own opinion is that the intention should be to eliminate these. The more third party machines an email appears to have passed through the harder it is to find out where it really came from.
The problem is in routers that are too intelligent for their own damn good, that busily reset flags that they shouldn't be touching.
Or even software designers thinking they are doing something clever when in fact they are being completly daft. A common problem, certainly not confined to IP coding in routers.
However, it's worth pointing out that this isn't trying to force the user to use an arbitrary third-party relay. Instead, this is try to get dialup users to relay through their own ISPs mail server.
:)
With certain ISP business models an ISP third party relay is litte different in practice from an open third party relay.
If properly configured, the result is to increase accountability.
That can be a very big if
Some ISPs add headers to identify the message source and, even if they don't, they've got server logs to allow them to track things in the event of spamming.
A necessary first step is to verify someone's idenity before giving them acess. But then knowing which account used which IP, when (static or at least fairly static IP addressing helps here) is the information you'd actually need.
Also there are advantages to spammers in using third party relays, any third party relays... e.g. you only need to handle a subset of SMTP conditions when sending exclusivly through a third party relay.
seeing as upgrades fixing the problem have been available since mid-2000, according to here. Upgrading a large network comprised of hundreds or thousands of routers takes time to plan, and you don't want to do too often, or until you're sure the new code base is going to work properly. A year is not unreasonable to obtain, test, plan & implement such an upgrade.
Actually, backwards compatibility was built into this. The problem is buggy equipment, which misbehaves when presented with option bits which it doesn't understand. This behavior violates RFC 791 Section 3.1 "A router MUST ignore IP options which it does not recognize.". Which means, pass on the packet with these options unchanged, rather than silently dropping them.
I agree actually, but some would say if you're the kind of person that turns kernel options on and off without reading all the text first and understanding all of it then you shouldn't be turning kernel options on and off - leave it to the distributions (who afaik all have ecn off by default)
Also, have you submitted a patch to fix the documentation?
-- MartinG To mail me: echo kewyjlcxyzvjfxbqwh | tr bcefhjklqvwxyz
ECN is disabled by default, there's big warnings in the kernel help... This is hardly newsworthy ;-)
Blessed are the pessimists, for they have made backups.
A *standard* RFC says that if a router doesn't know what to do with one of the reserved bits, it should leave it alone. The router doesn't have to understand ECN to do this.
--
Win dain a lotica, en vai tu ri silota
The thing that I don't understand is why the license agreement that comes with most drivers prohibits me from making copies of the drivers. Honestly, are you going to sell any fewer products if I give a copy of the driver to my friend?
I don't want free as in beer. I just want free beer.
I disagree that it is not newsworthy. I was having this very problem, and this article helped me correct it.
True, it does say in the kernel configuration that this option might get you into trouble. So do several options. What the kernel help doesn't say is any good way to tell that ECN is giving you problems. No diagnostic measures to try in the event of problems.
Some of us like to try new things. We like to see what happens if we enable a feature, because we like to find bugs and squash them. Many people who are running Linux just want a stable system to work with, and that's good. However, those of us who remember what it was like before Linux went mainstream want to continue to push the envelope.
www.eFax.com are spammers
Get the story right guys. This isn't a "linux is up to date while other people aren't" story -- this is a "linux is using a protocol marked as EXPERIMENTAL" story. EXPERIMENTAL protocols are protocols which are not only not internet standards, but are not even standard track.
If using an EXPERIMENTAL protocol breaks stuff, don't use it. You certainly shouldn't expect people to conform to your own non-standard behaviour.
Tarsnap: Online backups for the truly paranoid
Only broken routers who DO NOT OBEY the RFCs fail to pass ECN.
Right... only routers which do not obey an EXPERIMENTAL RFC run into problems. Guess what? You don't have to obey experimental RFCs. That's why they're *experimental*, not *standards*.
Tarsnap: Online backups for the truly paranoid
ECN is mature and at the Minn. IETF meeting it was voted to be added to the host requirement standard.
BZZZZZT. Nope, try again. There is now a [B]draft proposed standard[/B] for ECN. That's it. It isn't a standard yet, and won't be for quite some time yet.
Tarsnap: Online backups for the truly paranoid
Which RFC would that be? I can't seem to find it anywhere.
Tarsnap: Online backups for the truly paranoid
Maybe RFC 1812 - "4.2.2.6 Unrecognized Header Options:
Which doesn't apply here, since ECN is implemented via bits in the TOS octet, not in an optional IP header.
Tarsnap: Online backups for the truly paranoid
In case people are too lazy to look up RFC 2026 themselves, here's the relevant section:
And from the top of RFC 2481:
Tarsnap: Online backups for the truly paranoid
Putting aside for now the arguments about supporting experimental protocols and the use of one-used-and-now-reserved bits, there is a very simple issue here regarding firewall design.
Secure firewalls are designed to block traffic by default.
In other words, if the firewall doesn't understand the packets being sent through it, it will drop them. There's nothing wrong with this behaviour; in fact, if you try to build a "default-accept" firewall by blocking off packets which you know to be undesireable, you'll inevitably run into problems. However, anyone who has tried to get streaming media, or play warcraft, or use any other new protocols through an old firewall will be able to say that this policy can be a nuisance.
Which, of course, is one reason why there is an internet *standards track* giving people time to adapt to new protocols.
Tarsnap: Online backups for the truly paranoid
Where was this article two weeks ago, when we were upgrading all of our production servers to the 2.4.3 kernel, and couldn't figure out why we couldn't hit www.ibm.com or www.sabre.com.
After much troubleshooting, we found the problem. Perhaps the kernel help for ECN should have the warning about certain routers not supporting ECN nearer-to-the top of the help, instead of in the second paragraph:)
- James
signature smigmature
- James
- This was not in the IP options section, it was in the TOS section.
- These are probably not routers.
- Standards don't mean shit.
- From RFC 2481: "Because of the unstable history of the TOS octet, the use of the ECN field as specified in this document cannot be guaranteed to be backwards compatible with all past uses of these two bits."
- In RFC 791, the bits in question are shown set to zero.
- If a firewall doesn't understand a packet, and wants to protect a server behind it, it should drop the packet. Better for an experimental user to not be able to reach a site than for a system to be crashed or hacked.
Sure, the devices which fail should be updated, as this is now going to become somewhat common. If they are firewalls, and they are good ones, they've probably already notified someone of the increase in dropped packets of this type, and the solution is already in the works.ok then your [sic] infringing on my copyright! Could you as [sic] me next time before STEALING my comments for your own?
It might have helped if you had decided to read the comment you replied to. alehmann said nothing about implementing ECN. ahelmann did say something about blocking ECN. There is a world of difference. See, routers shouldn't just throw packets away if they have extra information in them. This is rude, and hinders adoption of new protocols, which don't hinder the router's operation in the least, and will often allow hosts on either side of the router to utilize these new protocols, even though the router in question cannot.
Go away and come back when you have learned about the Internet.
--
TO BUY A NEW CAR WOULD MAKE YOU SEXUALLY ATTRACTIVE.
I've been using the 2.4 kernel on my laptop since the week it was released, and I've had no problems (to my recollection, at least) visiting any sites. Granted, I use Datek instead of E-trade =). Looks like the sites that have older equipment have been quickly updating though, and I see no reason to disable this forward-thinking ECN.
"The universe seems neither benign nor hostile, merely indifferent." --Carl Sagan
An "opt-out" version could be made too, but I guess an external maintainer would be needed for such a list -- it wouldn't be desirable for every other connection to drop in the process of building what supposedly is a performance booster.
Hey, since MS owns Hotmail, I am sure that someone there thinks that they are not under any obligation to help out by acceptin ECN.
;-)
"Bill, do you think we should use this ECN stuff?"
"I don't know, do we own it?"
"Nope"
"Does NOT accepting this Screw up Linux?"
"Yep"
"Can you read my Mind?"
"Yep!"
Of course, I would never accuse anyone of being negligent, or of being underhanded. Me? never!
Check out the Vinny the Vampire comic strip
"It is a greater offense to steal men's labor, than their clothes"
However, it's worth pointing out that this isn't trying to force the user to use an arbitrary third-party relay. Instead, this is try to get dialup users to relay through their own ISPs mail server. If properly configured, the result is to increase accountability. Some ISPs add headers to identify the message source and, even if they don't, they've got server logs to allow them to track things in the event of spamming.
All right, this is a flame. Dareth I answer it...
ECN is *NOT* a standard, nor even standards track.
The fact ECN is written up as a request for comments document (RFC) means it *is* well on its way to becoming an Internet standard. Even the process itself of becoming an Internet standard is written up as an RFC. Look at the main web page at www.ietf.org and click on the link marked "The Internet Standards Process." Look at what is there! RFC 2026!
Many protocols in modern use never became an Internet "standard"; these include things like Mobile IP and 802.11 wireless Ethernet. Your idenfication protocol used by almost any IRC server is RFC's 1431 and 0931; they never became a standard. The number of Internet standards actually issued number less than 70. The IETF itself doesn't link to them much anymore since there is an normally an RFC representing the final form of each one.
[The] systems that you have 'problems' with are systems that support ECN, not systems that don't support ECN
Sorry! Thanks for playing. If the client says it supports ECN by flagging that fact with the bits once reserved for future use, it will not run into problems if the other side says it does not. The routers, firewalls, load balancers and/or servers on the other that do not know simply to leave those bits alone and continue normally can be faulted. The TCP protocol said those bits might be used later, but many programmers did not heed that warning. Instead, they drop packets using the once reserved bits, send TCP or ICMP reset messages, etc.
So in a way, it is the client's fault for supporing a newer extension of TCP/IP that the older one. The extension works fine -- as long as the other end still tries to establish a connection reguardless of ECN support!
The reason you have trouble with these sites is because you have a client which respects the ECN bit, and there are thousands/millions of other clients which don't, which has the effect of you never reaching the site, since you always back off in deference to those clients which don't.
Major sites must be busy to the point their links are congested, aren't they? I hope not. Read the article; the problem is routers, firewalls, and other devices seeing the bits marked "for furture use" being used, and considering packets invalid. Again, the fact that an ECN host tries contacting a host that does not support ECN is irrelevant; as long as the packets get through, the ECN-aware end will realize the other end does not, and revert to normal congestion behavior.
If no device on the other end spoke ECN, you wouldn't have this problem, as it wouldn't have any way to know to treat an ECN aware client differently than one that wasn't.
The ECN aware client is in charge, at least in the failure cases cited by the article. In most failure cases (at least those I have seen), it is the *client* requesting that the connection use ECN in the first place (although servers are welcome to as well). If after the initial handshake it discovers the remote host does not know ECN, it uses the old-style of TCP throttling behavior in response to bad packets. The ECN extension was designed to allow backwards compatibility with older clients; the people who designed it were not that foolish.
Get an education before you start posting pretending you know what you're talking about.
Is the fact I have a bachelors degree in Electrical and Computer Engineering (with honors), 99% of the work for a masters degree in the same, and the fact I was accepted to one of the top doctoral schools in the country enough education? I have spent many many years studying network protocol theory, and several years administering servers. I even wrote my own IRC client at ones point in time based off the RFC documents on it, and that protocol is hardly "experimental" anymore...
Let me just say that it is the systems that do *not* handle ECN that are at fault, not the systems that *do* support it. Read the RFC specification here here, or from your nearest RFC mirror (#2481). Note how bits marked as "presently unused" and "reserved for future use" are used for explicit congestion notification.
Any protocol implementation with a bit of sanity would know to leave reserved bits it did not how handle unchanged. Unfortunately, many systems do not do this. Some firewalls see reserved bits being used as a threat, and reset connections. Other systems have no clue how to react if a reserved bit is not the default value.
A partial list of sites I know have trouble with ECN enabled (thank goodness they are the minority of web sites out there) is below. But this is like the Y2K bug; it never really should have existed.
Sites with known ECN problems (that I've seen, anyway)
(These are only sites I visit rarely, thank goodness; I typically surf another 20+ websites daily without incident)
Dropping packets silently is more secure. Don't ask me why, I asked one time, and they just said, "dropping packets silently is more secure, now shut up and sit down, you non-NANOG-reading luser, whilst I upgrade my routers to the latest FreeBSD-STABLE ".