Linux Virtual Ethernet Bug Delivers Corrupt TCP/IP Data (vijayp.ca)
jones_supa writes: Vijay Pandurangan from Twitter warns about a Linux kernel bug that causes containers using Virtual Ethernet devices for network routing to not check TCP checksums. Examples of software stacks that use Virtual Ethernet devices are Docker on IPv6, Kubernetes, Google Container Engine and Mesos. The kernel flaw results in applications incorrectly receiving corrupt data in a number of situations, such as with bad networking hardware. The bug dates back at least 3 years or more – it is present in kernels as far back as the Twitter engineering team has tested. Their patch has been reviewed and accepted into the kernel, and is currently being backported to -stable releases back to 3.14 in various distributions. If you use containers in your setup, Pandurangan recommends that you deploy a kernel with this patch.
After ten years and billions of dollars, Twitter has finally contributed something useful to society.
Given that they wantonly violate people's civil liberties by shadowbanning twitter accounts of people they deem politically incorrect while continuing to allow tweets from known terrorist groups - Twitters patch should NOT be accepted in accordance with their own code of conduct.
"Cutting off your nose to spite your face."
Corrupt Data? I thought his name was Lore.
I was under the impression that virtual ethernet devices intentionally don't bother verifying checksums, because they were intended to be used in situations where there is very little probability of the data being corrupted.
Any application that checked for proper data (encrypted links, ssh, etc) would have automatically
been protected from this.
And, any attacker with access to the local network can already craft arbitrary TCP or other data and calculate a 'proper'
checksum to have data pass up the stack.
So, I'm glad it's fixed...but hard to see why this made it to Slashdot!
My experience has been that the TCP checksums are fairly useless - they can detect single bit errors only since they are just simple checksums, not CRCs or something more sophisticated. According to the article what was actually happening was that the virtual ethernet driver (veth) did not flag bad packets correctly. There's a flag that tells TCP there's no need for it to checksum since the hardware has already verified the packet. On errors, the veth driver set that flag instead of the one that says it couldn't verify the checksum.
How often does the TCP/UDP checksum detect errors that the previous two could not?
TCP/UDP checksums are useful for one thing primarily - mitigating the effect of defective network hardware. That is about the only thing that can cause a transport level checksum error. Anything else is caught with a very high probability by Layer 2 protocols, which typically use a 32 bit CRC. Some Layer 2 protocols have do relatively weak checksums, but not so weak that TCP checksums are likely to catch much more than they do.
I wish I had mod points. Please mod parent +1 funny
It turned out to be the fault of the VM and functionality offloading. See here: http://stackoverflow.com/quest...
Religion is what happens when nature strikes and groupthink goes wrong.
The only purpose of the checksum is to increment a universally ignored error counter so operators know to replace broken hardware.
TCP checksums are wholly insufficient to prevent corruption of TCP streams at anything resembling a useful rate. It went unnoticed for years because checksums are irrelevant.
I make a local cache of debian packages on one of my VMs, using apt-mirror. From time to time one of the packages would fail its checksum - reloading it from the offsite source would usually work. When I changed the VM's ethernet device to a virtual e1000, the problems went away. I later found an interesting cabling issue that was causing transmission errors between a switch the the physical host.
How many eyes were looking at the Virtual Ethernet feature/code?
Clearly, not enough.
I've said it before and I say it again. You need enough QUALIFIED and MOTIVATED eyes. You also need clear QA test cases in order to render all bugs shallow.
*** Suerte a todos y Feliz dia!
How often does the TCP/UDP checksum detect errors that the previous two could not?
May I remind the distinguished audience that IPv6 does NOT have a Header checksum. Therefore, on IPv6, TCP/UDP/SCTP checks are MANDATORY in all cases (UDP Checks were optional in IPv4, the guys doing VoIP are jumping of joy,/sarcasm> about it...).
One REALLY NEEDS to do those checks.
(Computer networks teacher speaking here).
*** Suerte a todos y Feliz dia!
Does anyone know if XenServer uses this functionality?
Slashdot your i and slashcross your t.
I worked with very early internet technology in the 1990's. Back then, the network chip was on a separate board just like the graphics card. The MAC address had to uploaded into flash memory on startup. These could blow up given the right conditions, then the card would either just keep blasting out random packet data, or traffic collisions would result in fragment packets (less that the minimum size) going out. Some early day drivers would pick up these packets. Filtering was done in software. Now with these chips built onto the motherboard, the hardware does all the filtering.
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
I once ran into problems where a server with a certain Intel chip would sometimes corrupt data over the PCI bus. I had to put in a check in my driver to detect that chip and turn off a major PCI optimization if that one chip was detected. CRC errors would not detect it because that was handled in the network adapter. At the time some of my tests were with Netbeui which has no L3/L4 checksums and resulted in corrupt files (which were detected by the test scripts).
I've run into a number of times where data gets corrupted like this and while the ip/tcp/udp checksum isn't all that great it did allow identifying the problem.
This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
I started working on transport protocols, And I always wondered about this:
ethernet has its crc, ipv4 has its crc, how often does the TCP/UDP checksum detect errors that the previous two could not?
TCP was finalized in 1981, long before modern Ethernet was around. TCP was originally developed for ARPAnet which used various longhaul communications technologies (think "modems") to interconnect sites. In those days, communications hardware usually did not have CRC or any other checksum checking. So TCP did its (simplistic) checksum to provide some protection.
IPv6 does not have the checksum, but the ethernet one is still there.
IPv6 came along much later, after Ethernet (and long-haul communications) had advanced to the point where CRC protection was a standard expectation. The value of a checksum in the IP header was recognized as sufficiently pointless to drop it.
The mystery to me is that an Ethernet NIC passes up a known corrupt packet, and the kernel doesn't drop it! I suppose this is so it's possible for a human to sense the presence of hardware failures, since pcap (aka tcpdump, aka WireShark) can show the corrupt packets. It really sucks that this, due to a subtle bug, could result in known bad packets leaking into apps which assume the lower layers did their job. That's what happened, yes?
How often does the TCP/UDP checksum detect errors that the previous two could not?
May I remind the distinguished audience that IPv6 does NOT have a Header checksum. Therefore, on IPv6, TCP/UDP/SCTP checks are MANDATORY in all cases...
One REALLY NEEDS to do those checks.
(Computer networks teacher speaking here).
May I remind the distinguished teacher that (a) the checksums in TCP and UDP are lame compared to CRC and (b) they are irrelevant given a sufficiently robust data link layer. As I said in another post just above, TCP and UDP originally included checksums because IP was being carried over lame data links, and so the checksums were a bit of "belt and suspenders". Very few data link protocols today lack a robust CRC, so the checksums are anachronistic.
The particular issue in this topic seems to be not that the Ethernet CRC32 is lame (it most certainly is not), but that the Linux network stack had a subtle bug introduced into it that caused known bad packets to be passed along as if they were not. This is not a failure of Ethernet, it is a failure of the Linux network stack. I for one am ecstatic that this has been found, because I think this might be what has been haunting a product of mine for several years! (hopefully preparing to rejoice).