Slashdot Mirror


Facebook Seeks Devs To Make Linux Network Stack As Good As FreeBSD's

An anonymous reader writes Facebook posted a career application which, in their own words is 'seeking a Linux Kernel Software Engineer to join our Kernel team, with a primary focus on the networking subsystem. Our goal over the next few years is for the Linux kernel network stack to rival or exceed that of FreeBSD.' Two interesting bullet points listing "responsibilities": Improve IPv6 support in the kernel, and eliminate perf and stability issues. FB is one of the worlds largest IPv6 deployments; Investigate and participate in emerging protocols (MPTCP, QUIC, etc) discussions,implementation, experimentation, tooling, etc.

9 of 195 comments (clear)

  1. Re:This does pose the question: by HuguesT · · Score: 4, Informative

    I love FreeBSD, I support them financially every year, and I use it daily but it is not uniformly better than Linux. Hardware support, in particular, is very far behind. Two random examples:

    1- My NAS system does not recognise any USB storage when they are plugged in after boot (no hotplug). It does not support USB superspeed (USB 3.0) either (I have to boot in compatibility mode by disabling xHCI in the BIOS). This is a known issue with some Asus motherboards, still unfixed in 10.0
    2- FreeBSD does not install on some of my HP G6 servers. The kernel simply segfaults. I really wanted FreeBSD on this hardware, so I run it in a VM under Linux (using KVM). Has been running brilliantly for about 2 years now.

    Also security update in FreeBSD are really difficult. I haven't finished dealing with updating my ports since I moved from 9.2 to 9.3 last week.

    I have to say this though: when it runs, it runs really well.

  2. Re:FreeBSD network stack by Zarjazz · · Score: 5, Informative

    As someone who has used various BSD's and Linux in large scale environments, and is a fan of both, I've configured servers with multi-10Gb interfaces and handling 100k+ requests a second I honestly can't think of any example of where Linux has been inferior. The often repeated line that FreeBSD has a better networking stack was probably true over 10 years ago with Linux 2.2 and earlier, but since then I'd say that myth is just bullcrap.

    Maybe Facebook are talking about some specific IPv6 or cutting edge features like MPTCP they need on their network, but as a general statement it's utterly misleading.

  3. Re:FreeBSD network stack by Bengie · · Score: 4, Informative

    A lot of sysadmins from companies that push a lot of data over lots of connections have blogs about tweaking your OS to handle stuff like 10gb+ of traffic and millions of connections. A lot of these people complain about Linux having strange problems under these loads, and FreeBSD just seems to work. Linux may be faster in some cases, but it still has stability issues that are hard to debug.

    Then there's the whole thing about most network stack research happening primarily on FreeBSD because of licensing. There's a new zero-copy network API that was developed in FreeBSD that allows line rate 64byte 10gb traffic on a 450mhz quadcore cpu. Linux and old-api FreeBSD were about 1/10th the packets-per-second.

    A new thread friendly socket API has just been pushed to FreeBSD 11. One of Netflix' engineers had a pet project that now allows near zero lock-contention thread scaling. He was able to done line speed 40gb/s with 150k TCP sessions. Instead of having one file descriptor with a single listening thread, you instead have one file-descriptor and listening thread per MSS queue from the NIC and you can lock your thread to the same CPU as the MSS queue, so the packet is already in L2 cache. No shared network state. This also means no share locks with nearly perfect linear scaling and virtually no cache trashing or bouncing.

    They're starting work for extend the API to also allow the OS to better handle NUMA and to attach the MSS queues to the CPU to which the NIC is attached. This will virtually remove all cross-talk among the CPU cores trying to handle the network state.

    They're looking into expanding this same concept to the Storage IO system.

  4. Re:FreeBSD network stack by phoenix_rizzen · · Score: 3, Informative

    Google searches for "netmap" and "FreeBSD" will give you lots of information on pushing millions of pps through 900 MHz single-core machines. Netmap is also available on Linux. There's even a netmap-enabled version of IPFW that allows you to do packet filtering and routing completely in userspace, again will millions of pps. IPFW is also available on Linux, although I don't know if the netmap-enabled version is.

    Google searches for "openconnect" and "FreeBSD" will give you lots of information and blog posts from the Netflix guys about why they picked FreeBSD, and how it all works, including details on the networking.

    Google searches for "Adrian Chadd", or "RSS scaling", or similar terms will show you threads and posts on various FreeBSD mailing lists with information detailing a lot of the MSS/RSS work that's going into FreeBSD 11, and several projects that build off that. Those also have links to other information around sockets and similar.

    Google searches for "NUMA" and "FreeBSD" will bring up mailing list threads that cover the different projects being undertaken to improve the CPU affinity and thread locality and all that jazz.

    Sure, it would be nice if the OP had posted links to the info, but it's not like the information is secret or hard to find.

  5. High perf SMP coding is in a category of its own by m.dillon · · Score: 5, Informative

    Designing algorithms that play well in a SMP environment under heavy loads is not easy. It isn't just a matter of locking within the protocol stack... contention between cpus can get completely out of control even from small 6-instruction locking windows. And it isn't just the TCP stack which needs be contention-free. The *entire* packet path from the hardware all the way through to the system calls made by userland have to be contention-free. Plus the scheduler has to be able to optimize the data flow to reduce unnecessary cache mastership changes.

    It's fun, but so many kernel subsystems are involved that it takes a very long time to get it right. And there are only a handful of kernel programmers in the entire world capable of doing it.

    -Matt

  6. Re:This does pose the question: by lgw · · Score: 4, Informative

    If you have 1 million Linux machines deployed, with full Linux-specific software stacks on each, it's cheaper and easier (and most likely faster) to fix the problems you see in Linux than to move the fleet to a new OS.

    Facebook's dev shop culture is all about banging out code as fast as possible for the problem in front of them, then moving on. Forward planning isn't really the thing there, from what I hear (though I think you're no longer discouraged from testing your code before it goes live, that's a recent change). Moving to BSD might well be a better long term plan, but it would take years to get there and they don't really think on that timescale, from the rumors I hear anyhow.

    --
    Socialism: a lie told by totalitarians and believed by fools.
  7. Re:Why not rebuild your ports? by Anonymous Coward · · Score: 5, Informative

    If you are managing a bunch (more than 2) FreeBSD machines building ports on the live system is wrong and counterproductive. In this kind of scenario you really should setup a package building machine and distribute your own binary packages to them. With pkgng this really works well.

    This applies if you don't want to use the official binary packages (for example because you need different options from the defaults, or you tweak some ports)

  8. Re:FreeBSD network stack by Lennie · · Score: 3, Informative

    The MPTCP stack for Linux isn't in mainline, but is much further ahead then the FreeBSD version (not sure if it that in mainline).

    --
    New things are always on the horizon
  9. It's not just NetMap that brings performance ... by Anonymous Coward · · Score: 3, Informative

    FreeBSD also includes an alternative to select/poll called kqueue that allows it to scale client connections massively with minimal performance degradation. Linux introduced epoll as a work-alike, but it has some drawbacks ...

    http://www.eecs.berkeley.edu/~sangjin/2012/12/21/epoll-vs-kqueue.html

    What's a massive scale? WhatsApp, recently acquired by Facebook, uses FreeBSD and Erlang to power it's service offerings. They sustain over 2 million simultaneous client connections per FreeBSD server ...

    http://blog.whatsapp.com/196/1-million-is-so-2011

    I wouldn't be surprised if the internal comparison between Linux and FreeBSD network features/performance was fueled by feedback from their new subsidiary.

    FreeBSD also works very closely with the Nginx community. If you look at the dev mailing list, you will see a fair amount of kernel level dev work sponsored by companies that use nginx on top of FreeBSD. This constant tuning keeps nginx consumers loyal to FreeBSD for obvious reasons. There is no wonder why this combination was selected by NetFlix to power their new content delivery network.