Pushing the Limits of Network Traffic With Open Source (cloudflare.com)
An anonymous reader writes: CloudFlare's content delivery network relies on their ability to shuffle data around. As they've scaled up, they've run into some interesting technical limits on how fast they can manage this. Last month they explained how the unmodified Linux kernel can only handle about 1 million packets per second, when easily-available NICs can manage 10 times that. So, they did what you're supposed to do when you encounter a problem with open source software: they developed a patch for the Netmap project to increase throughput. "Usually, when a network card goes into the Netmap mode, all the RX queues get disconnected from the kernel and are available to the Netmap applications. We don't want that. We want to keep most of the RX queues back in the kernel mode, and enable Netmap mode only on selected RX queues. We call this functionality: 'single RX queue mode.'" With their changes, Netmap was able to receive about 5.8 million packets per second. Their patch is currently awaiting review.
If I have a 100Mb/s NIC, I'm only getting 10 MB/s on Linux? I doubt that.
Packets != Bytes
"When I first heard Daydream Nation it quite frankly scared the living shit out of me." -- Matthew Stearns
must be thoroughly considered. CloudFlare is the greatest Man-in-the-Middle on the Internet, and don't think for a second they're not collaborating with U.S agencies who wants to get at sensitive data going through their systems.
A packet is not a byte. A packet is a sequence of bits including a address, other header information and the actual payload.
IPv4 packet will as example have 20 bytes(160 bits) header and a maximum payload of 65,515 bytes(though often lower in practice)
If you were to send a lot of packets with only a single byte payload then each packet will be 168 bits and your 100 Mb/s will result in about 600 000 packets. But at a gigabit connection the actual limit will start to hit for such strange traffic.
Note that normally you would send more than a single byte of information/packet so in most real applications you would need much higher speeds to hit the limit. At 105 bytes of information you would have a total length of 1000, bits so would be at about the limit on gigabit hardware. But still most high bandwidth traffic tends to have much more information in each packet and thus not usually hit such limits.
The limit has really started to hit due to the high availability of 10 gigabit and faster network cards coming down in price.
If they only need to "shuffle" packets around (ie, not crack open the frames and actually interpret the data beyond making routing decisions) then routers/switches are better suited for this. If they actually need to do something more with the data then that quoted 5.8 million packets/sec. rate will drop very quickly for each single line of code they add that does anything with the data.
Sounds about right. Even if you were to ignore TCP/IP overhead, the most you could hope to achieve is 12.5MB/s over a 100Mb/s link. But that has nothing to do with what this article is about.
Wouldn't it just be easier to put this in systemd?
My employer deals with this on their multi-core MIPS processors. What we do is we can run Linux on one set of cores and dedicated applications on other cores. These applications offload most of the TCP/IP stack and only pass the relevant traffic to the kernel. The Ubiquiti EdgeRouter Lite uses one of our lowest-end chips and handles 1M packets/second. Our higher-end chips can easily handle far more packets. Then again, the dedicated cores are also able to take much better advantage of the hardware offload support for forwarding and filtering. Even without using the dedicated special application we can handle 40Gbps or more of traffic on the high-end chips. We can also handle stuff like IPSec at these rates due to built-in encryption and hashing instructions if coded properly.
Having the right NIC card can also help since some NIC cards can offload things like TCP/IP segmentation and reassembly. I've also dealt with small gigabit switch chips that can offload stuff like NAT but Linux can't really take advantage of that as-is.
There's a lot of room for improvement. Some years ago I was doing performance analysis for Atheros with respect to CPU cache utilization. The biggest bottleneck was the fact that the transmit path in the Linux networking stack would only pass a single packet at a time. Batch processing of packets for WiFi makes a HUGE difference since groups of packets need to be aggregated for 802.11N. It also would allow for more efficient packet processing for non-wireless as well. There are a lot of other areas that also could be improved.
This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
From TFA:
... which implies that NICs can easily manage more than 10 million packets per second, right?
5.8 million packets per second might be fast, but it is still _ much lower _ than the theoretical >10 million packets per second max speed ...
I am curious, has any software (no matter if it's open source, or proprietary) successfully achieved the >10 million packets per second threshold yet?
Muchas Gracias, Señor Edward Snowden !
The general rule is divide-by-ten: 100Mb link means 10MB throughput. Over eight for the bit-to-byte convertion, but over ten to allow for overhead. It's not exact, but it's a good rule of thumb.