Linux Virtual Ethernet Bug Delivers Corrupt TCP/IP Data (vijayp.ca)

← Back to Stories (view on slashdot.org)

Linux Virtual Ethernet Bug Delivers Corrupt TCP/IP Data (vijayp.ca)

Posted by timothy on Monday February 22, 2016 @05:22AM from the absolute-power dept.

jones_supa writes: Vijay Pandurangan from Twitter warns about a Linux kernel bug that causes containers using Virtual Ethernet devices for network routing to not check TCP checksums. Examples of software stacks that use Virtual Ethernet devices are Docker on IPv6, Kubernetes, Google Container Engine and Mesos. The kernel flaw results in applications incorrectly receiving corrupt data in a number of situations, such as with bad networking hardware. The bug dates back at least 3 years or more – it is present in kernels as far back as the Twitter engineering team has tested. Their patch has been reviewed and accepted into the kernel, and is currently being backported to -stable releases back to 3.14 in various distributions. If you use containers in your setup, Pandurangan recommends that you deploy a kernel with this patch.

23 of 40 comments (clear)

Min score:

Reason:

Sort:

Better late than never by Anonymous Coward · 2016-02-22 05:40 · Score: 4, Funny

After ten years and billions of dollars, Twitter has finally contributed something useful to society.
Twitter's fix shoud be denied by Anonymous Coward · 2016-02-22 05:41 · Score: 1, Interesting

Given that they wantonly violate people's civil liberties by shadowbanning twitter accounts of people they deem politically incorrect while continuing to allow tweets from known terrorist groups - Twitters patch should NOT be accepted in accordance with their own code of conduct.
There's a term for that by halivar · 2016-02-22 05:59 · Score: 3, Insightful

"Cutting off your nose to spite your face."
Re:corrupt data by U2xhc2hkb3QgU3Vja3M · 2016-02-22 06:22 · Score: 1, Funny

Corrupt Data? I thought his name was Lore.
I could have sworn this was intentional by Verdatum · 2016-02-22 06:23 · Score: 4, Interesting

I was under the impression that virtual ethernet devices intentionally don't bother verifying checksums, because they were intended to be used in situations where there is very little probability of the data being corrupted.
1. Re:I could have sworn this was intentional by TeknoHog · 2016-02-22 06:42 · Score: 1
  
  This. How do you get corrupted data from bad networking hardware into a virtual machine, without it passing through a real NIC first?
  
  --
  Escher was the first MC and Giger invented the HR department.
2. Re:I could have sworn this was intentional by Anonymous Coward · 2016-02-22 06:49 · Score: 5, Informative
  
  Most NICs don't drop packets with bad L3/L4 checksums. Instead they flag them as bad and pass them to software, and the packet doesn't get checked until it hits the TCP/IP stack. The problem is that in this configuration, the packet arrives and the physical NIC and is flagged as bad, but when it is passed through the veth device that flag is intentionally cleared, and only after passing through the veth device does it hit the TCP/IP stack. Because the checksum was marked as good the stack trusts it and passes the data up to the socket.
3. Re:I could have sworn this was intentional by butlerm · 2016-02-22 08:25 · Score: 1
  
  Most NICs don't drop packets with bad L3/L4 checksums
  Traditionally, NICs do not even "know" that there is such a thing as Layer 3, let alone check it in any way. L3 checksum validation is a bonus feature.
  Bad L3 checksums tend to be caused by defective networking hardware, and in this case the defective networking hardware of the recipient. If you are using checksum validation offload, ignoring the result in the presence of defective hardware isn't likely to make a difference either way.
Good it's fixed, but not too bad of a bug. by Anonymous Coward · 2016-02-22 06:25 · Score: 1

Any application that checked for proper data (encrypted links, ssh, etc) would have automatically
been protected from this.
And, any attacker with access to the local network can already craft arbitrary TCP or other data and calculate a 'proper'
checksum to have data pass up the stack.
So, I'm glad it's fixed...but hard to see why this made it to Slashdot!
1. Re:Good it's fixed, but not too bad of a bug. by AaronW · 2016-02-22 09:59 · Score: 2
  
  A lot of traffic is sent unencrypted because encryption just isn't needed. You don't get encryption for free in most cases since it requires a fair amount of CPU overhead to implement it and/or additional hardware, plus there's all the overhead of setting up an encrypted link. Within a LAN, encryption usually isn't required for most of the data being sent.
  As for corrupting packets, I had a setup in my cubical a few weeks ago running 10G traffic where I could corrupt packets on request by switching the fluorescent light above my desk on and off. I was implementing support for a new phy chip connected to our ASIC that supports 100Base-T 1GBase-T, 2.5G, 5G and 10GBase-T.
  
  --
  This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
Re:Data needed by putaro · 2016-02-22 06:41 · Score: 2

My experience has been that the TCP checksums are fairly useless - they can detect single bit errors only since they are just simple checksums, not CRCs or something more sophisticated. According to the article what was actually happening was that the virtual ethernet driver (veth) did not flag bad packets correctly. There's a flag that tells TCP there's no need for it to checksum since the hardware has already verified the packet. On errors, the veth driver set that flag instead of the one that says it couldn't verify the checksum.
Re:Data needed by butlerm · 2016-02-22 06:46 · Score: 2

How often does the TCP/UDP checksum detect errors that the previous two could not?
TCP/UDP checksums are useful for one thing primarily - mitigating the effect of defective network hardware. That is about the only thing that can cause a transport level checksum error. Anything else is caught with a very high probability by Layer 2 protocols, which typically use a 32 bit CRC. Some Layer 2 protocols have do relatively weak checksums, but not so weak that TCP checksums are likely to catch much more than they do.
Re: corrupt data by LocutusOfBorg1 · 2016-02-22 07:09 · Score: 1

I wish I had mod points. Please mod parent +1 funny
I had the same thing by bytesex · 2016-02-22 07:19 · Score: 1

It turned out to be the fault of the VM and functionality offloading. See here: http://stackoverflow.com/quest...

--
Religion is what happens when nature strikes and groupthink goes wrong.
It's a feature not a bug by WaffleMonster · 2016-02-22 07:51 · Score: 1

The only purpose of the checksum is to increment a universally ignored error counter so operators know to replace broken hardware.
TCP checksums are wholly insufficient to prevent corruption of TCP streams at anything resembling a useful rate. It went unnoticed for years because checksums are irrelevant.
I've been bitten by this by shocking · 2016-02-22 08:30 · Score: 1

I make a local cache of debian packages on one of my VMs, using apt-mirror. From time to time one of the packages would fail its checksum - reloading it from the offsite source would usually work. When I changed the VM's ethernet device to a virtual e1000, the problems went away. I later found an interesting cabling issue that was causing transmission errors between a switch the the physical host.
How many eyes were looking at the Virtual Ethernet by williamyf · 2016-02-22 09:15 · Score: 1

How many eyes were looking at the Virtual Ethernet feature/code?
Clearly, not enough.
I've said it before and I say it again. You need enough QUALIFIED and MOTIVATED eyes. You also need clear QA test cases in order to render all bugs shallow.

--
*** Suerte a todos y Feliz dia!
Re:Data needed by williamyf · 2016-02-22 09:20 · Score: 1

How often does the TCP/UDP checksum detect errors that the previous two could not?
May I remind the distinguished audience that IPv6 does NOT have a Header checksum. Therefore, on IPv6, TCP/UDP/SCTP checks are MANDATORY in all cases (UDP Checks were optional in IPv4, the guys doing VoIP are jumping of joy,/sarcasm> about it...).
One REALLY NEEDS to do those checks.
(Computer networks teacher speaking here).

--
*** Suerte a todos y Feliz dia!
XenServer? by slashcross · 2016-02-22 09:29 · Score: 1

Does anyone know if XenServer uses this functionality?

--
Slashdot your i and slashcross your t.
Re:Data needed by mikael · 2016-02-22 09:47 · Score: 1

I worked with very early internet technology in the 1990's. Back then, the network chip was on a separate board just like the graphics card. The MAC address had to uploaded into flash memory on startup. These could blow up given the right conditions, then the card would either just keep blasting out random packet data, or traffic collisions would result in fragment packets (less that the minimum size) going out. Some early day drivers would pick up these packets. Filtering was done in software. Now with these chips built onto the motherboard, the hardware does all the filtering.

--
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
Re:Data needed by AaronW · 2016-02-22 09:50 · Score: 1

I once ran into problems where a server with a certain Intel chip would sometimes corrupt data over the PCI bus. I had to put in a check in my driver to detect that chip and turn off a major PCI optimization if that one chip was detected. CRC errors would not detect it because that was handled in the network adapter. At the time some of my tests were with Netbeui which has no L3/L4 checksums and resulted in corrupt files (which were detected by the test scripts).
I've run into a number of times where data gets corrupted like this and while the ip/tcp/udp checksum isn't all that great it did allow identifying the problem.

--
This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
Re:Data needed by flatulus · 2016-02-23 04:00 · Score: 1

I started working on transport protocols, And I always wondered about this:
ethernet has its crc, ipv4 has its crc, how often does the TCP/UDP checksum detect errors that the previous two could not?
TCP was finalized in 1981, long before modern Ethernet was around. TCP was originally developed for ARPAnet which used various longhaul communications technologies (think "modems") to interconnect sites. In those days, communications hardware usually did not have CRC or any other checksum checking. So TCP did its (simplistic) checksum to provide some protection.

IPv6 does not have the checksum, but the ethernet one is still there.
IPv6 came along much later, after Ethernet (and long-haul communications) had advanced to the point where CRC protection was a standard expectation. The value of a checksum in the IP header was recognized as sufficiently pointless to drop it.
The mystery to me is that an Ethernet NIC passes up a known corrupt packet, and the kernel doesn't drop it! I suppose this is so it's possible for a human to sense the presence of hardware failures, since pcap (aka tcpdump, aka WireShark) can show the corrupt packets. It really sucks that this, due to a subtle bug, could result in known bad packets leaking into apps which assume the lower layers did their job. That's what happened, yes?
Re:Data needed by flatulus · 2016-02-23 04:33 · Score: 1

How often does the TCP/UDP checksum detect errors that the previous two could not?
May I remind the distinguished audience that IPv6 does NOT have a Header checksum. Therefore, on IPv6, TCP/UDP/SCTP checks are MANDATORY in all cases...
One REALLY NEEDS to do those checks.
(Computer networks teacher speaking here).
May I remind the distinguished teacher that (a) the checksums in TCP and UDP are lame compared to CRC and (b) they are irrelevant given a sufficiently robust data link layer. As I said in another post just above, TCP and UDP originally included checksums because IP was being carried over lame data links, and so the checksums were a bit of "belt and suspenders". Very few data link protocols today lack a robust CRC, so the checksums are anachronistic.
The particular issue in this topic seems to be not that the Ethernet CRC32 is lame (it most certainly is not), but that the Linux network stack had a subtle bug introduced into it that caused known bad packets to be passed along as if they were not. This is not a failure of Ethernet, it is a failure of the Linux network stack. I for one am ecstatic that this has been found, because I think this might be what has been haunting a product of mine for several years! (hopefully preparing to rejoice).