Linux Virtual Ethernet Bug Delivers Corrupt TCP/IP Data (vijayp.ca)

← Back to Stories (view on slashdot.org)

Linux Virtual Ethernet Bug Delivers Corrupt TCP/IP Data (vijayp.ca)

Posted by timothy on Monday February 22, 2016 @05:22AM from the absolute-power dept.

jones_supa writes: Vijay Pandurangan from Twitter warns about a Linux kernel bug that causes containers using Virtual Ethernet devices for network routing to not check TCP checksums. Examples of software stacks that use Virtual Ethernet devices are Docker on IPv6, Kubernetes, Google Container Engine and Mesos. The kernel flaw results in applications incorrectly receiving corrupt data in a number of situations, such as with bad networking hardware. The bug dates back at least 3 years or more – it is present in kernels as far back as the Twitter engineering team has tested. Their patch has been reviewed and accepted into the kernel, and is currently being backported to -stable releases back to 3.14 in various distributions. If you use containers in your setup, Pandurangan recommends that you deploy a kernel with this patch.

7 of 40 comments (clear)

Min score:

Reason:

Sort:

Better late than never by Anonymous Coward · 2016-02-22 05:40 · Score: 4, Funny

After ten years and billions of dollars, Twitter has finally contributed something useful to society.
There's a term for that by halivar · 2016-02-22 05:59 · Score: 3, Insightful

"Cutting off your nose to spite your face."
I could have sworn this was intentional by Verdatum · 2016-02-22 06:23 · Score: 4, Interesting

I was under the impression that virtual ethernet devices intentionally don't bother verifying checksums, because they were intended to be used in situations where there is very little probability of the data being corrupted.
1. Re:I could have sworn this was intentional by Anonymous Coward · 2016-02-22 06:49 · Score: 5, Informative
  
  Most NICs don't drop packets with bad L3/L4 checksums. Instead they flag them as bad and pass them to software, and the packet doesn't get checked until it hits the TCP/IP stack. The problem is that in this configuration, the packet arrives and the physical NIC and is flagged as bad, but when it is passed through the veth device that flag is intentionally cleared, and only after passing through the veth device does it hit the TCP/IP stack. Because the checksum was marked as good the stack trusts it and passes the data up to the socket.
Re:Data needed by putaro · 2016-02-22 06:41 · Score: 2

My experience has been that the TCP checksums are fairly useless - they can detect single bit errors only since they are just simple checksums, not CRCs or something more sophisticated. According to the article what was actually happening was that the virtual ethernet driver (veth) did not flag bad packets correctly. There's a flag that tells TCP there's no need for it to checksum since the hardware has already verified the packet. On errors, the veth driver set that flag instead of the one that says it couldn't verify the checksum.
Re:Data needed by butlerm · 2016-02-22 06:46 · Score: 2

How often does the TCP/UDP checksum detect errors that the previous two could not?
TCP/UDP checksums are useful for one thing primarily - mitigating the effect of defective network hardware. That is about the only thing that can cause a transport level checksum error. Anything else is caught with a very high probability by Layer 2 protocols, which typically use a 32 bit CRC. Some Layer 2 protocols have do relatively weak checksums, but not so weak that TCP checksums are likely to catch much more than they do.
Re:Good it's fixed, but not too bad of a bug. by AaronW · 2016-02-22 09:59 · Score: 2

A lot of traffic is sent unencrypted because encryption just isn't needed. You don't get encryption for free in most cases since it requires a fair amount of CPU overhead to implement it and/or additional hardware, plus there's all the overhead of setting up an encrypted link. Within a LAN, encryption usually isn't required for most of the data being sent.
As for corrupting packets, I had a setup in my cubical a few weeks ago running 10G traffic where I could corrupt packets on request by switching the fluorescent light above my desk on and off. I was implementing support for a new phy chip connected to our ASIC that supports 100Base-T 1GBase-T, 2.5G, 5G and 10GBase-T.

--
This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.