Slashdot Mirror


Facebook Seeks Devs To Make Linux Network Stack As Good As FreeBSD's

An anonymous reader writes Facebook posted a career application which, in their own words is 'seeking a Linux Kernel Software Engineer to join our Kernel team, with a primary focus on the networking subsystem. Our goal over the next few years is for the Linux kernel network stack to rival or exceed that of FreeBSD.' Two interesting bullet points listing "responsibilities": Improve IPv6 support in the kernel, and eliminate perf and stability issues. FB is one of the worlds largest IPv6 deployments; Investigate and participate in emerging protocols (MPTCP, QUIC, etc) discussions,implementation, experimentation, tooling, etc.

42 of 195 comments (clear)

  1. LOL, so why not use theirs? by gstoddart · · Score: 5, Insightful

    Look, this is FreeBSD ... why not just take their damned code?

    It's not like you're not allowed to do that. That's what is great about the BSD license.

    If FreeBSD's network stack is what you aspire to, why reinvent the wheel?

    --
    Lost at C:>. Found at C.
    1. Re:LOL, so why not use theirs? by Bengie · · Score: 4, Insightful

      You can't just copy/paste code and expect it to work, it must be refactored, and may not even be compatible until completely new features are added in the current code to allow the new code to function. It's like saying "that fusion reactor is an open design, why not just place that in our coal power plant?". You may need to make some changes to your current power plant before you change its core.

    2. Re:LOL, so why not use theirs? by gstoddart · · Score: 3, Insightful

      Licensing? Can code released under the BSD license be re-released under the GPL?

      Been years since I read the license, but the BSD licenses are pretty permissive, to the extent you can take BSD stuff and use it as a basis for commercial products.

      BSD has always been about writing awesome code, and letting people do what they want with it, as opposed to imposing ideology on people.

      I'd be surprised if Linux doesn't already have code from FreeBSD in it.

      --
      Lost at C:>. Found at C.
    3. Re:LOL, so why not use theirs? by Anonymous Coward · · Score: 2, Informative

      Licensing? Can code released under the BSD license be re-released under the GPL?

      IANAL.

      Sort of. Technically, as you are (presumably) not the copyright holder of the work, you would have no permission to grant any sort of copyright license to third parties.

      But the license used by FreeBSD is very permissive. Anyone is permitted to copy, modify, redistribute, or redistribute modified versions of the software and essentially the only requirement is that you also include the three parts of the license text, verbatim, somewhere in all copies.

      So what this allows you to do is to create a modified version of the software. If your modifications are non-trivial, they may be a copyrightable work themselves. You may license your modifications to others under the terms of the GPL. The result is a combined work, with portions of it under the GPL license, and portions of it under the terms of the FreeBSD license. This works because someone can distribute the combination and simultaneously satisfy the conditions of both licenses: in addition to the GPL requirements, one must also include the text as required by the FreeBSD license.

    4. Re:LOL, so why not use theirs? by mi · · Score: 3

      Check your /usr/include/X11/extensions/fontcachstr.h for example. Or, to put us back on topic of FreeBSD networking, the <netinet/igmp.h>, <netinet/if_ether.h>, and <net/ethernet.h>...

      --
      In Soviet Washington the swamp drains you.
  2. Biggest troll on Slashdot ever by CAPSLOCK2000 · · Score: 2

    Insulting their work might not be the right way to get the best Linux kernel network engineers to join your company.

    1. Re:Biggest troll on Slashdot ever by Anonymous Coward · · Score: 3, Funny

      Insulting their work might not be the right way to get the best Linux kernel network engineers to join your company.

      Or it might be the best way ever. Linux people and their egos....

  3. Re:Why not just use FreeBSD then? by halivar · · Score: 2

    If they've done a lot of work already on a custom kernel, it may not make sense to try porting all that work to completely different kernel architecture. They may have done a cost benefit analysis and decided that the cost of improving their current architecture is less than retrofitting FreeBSD. Not saying this is the case, just posing a possible scenario where this would be the better option.

  4. Re:Why not just use FreeBSD then? by discord5 · · Score: 4, Funny

    It might be a silly question, but why don't they just use FreeBSD in that case?

    You haven't heard? I'm sorry... My condolences, but the writing had been on the wall for a while though. Netcraft even confirmed it years ago...

    *BSD is dying.

  5. FreeBSD network stack by CadentOrange · · Score: 4, Insightful

    What makes the FreeBSD network stack superior?

    1. Re:FreeBSD network stack by gstoddart · · Score: 4, Funny

      Way more cowbell, and a much cooler logo.

      --
      Lost at C:>. Found at C.
    2. Re:FreeBSD network stack by Zarjazz · · Score: 5, Informative

      As someone who has used various BSD's and Linux in large scale environments, and is a fan of both, I've configured servers with multi-10Gb interfaces and handling 100k+ requests a second I honestly can't think of any example of where Linux has been inferior. The often repeated line that FreeBSD has a better networking stack was probably true over 10 years ago with Linux 2.2 and earlier, but since then I'd say that myth is just bullcrap.

      Maybe Facebook are talking about some specific IPv6 or cutting edge features like MPTCP they need on their network, but as a general statement it's utterly misleading.

    3. Re:FreeBSD network stack by Bengie · · Score: 4, Informative

      A lot of sysadmins from companies that push a lot of data over lots of connections have blogs about tweaking your OS to handle stuff like 10gb+ of traffic and millions of connections. A lot of these people complain about Linux having strange problems under these loads, and FreeBSD just seems to work. Linux may be faster in some cases, but it still has stability issues that are hard to debug.

      Then there's the whole thing about most network stack research happening primarily on FreeBSD because of licensing. There's a new zero-copy network API that was developed in FreeBSD that allows line rate 64byte 10gb traffic on a 450mhz quadcore cpu. Linux and old-api FreeBSD were about 1/10th the packets-per-second.

      A new thread friendly socket API has just been pushed to FreeBSD 11. One of Netflix' engineers had a pet project that now allows near zero lock-contention thread scaling. He was able to done line speed 40gb/s with 150k TCP sessions. Instead of having one file descriptor with a single listening thread, you instead have one file-descriptor and listening thread per MSS queue from the NIC and you can lock your thread to the same CPU as the MSS queue, so the packet is already in L2 cache. No shared network state. This also means no share locks with nearly perfect linear scaling and virtually no cache trashing or bouncing.

      They're starting work for extend the API to also allow the OS to better handle NUMA and to attach the MSS queues to the CPU to which the NIC is attached. This will virtually remove all cross-talk among the CPU cores trying to handle the network state.

      They're looking into expanding this same concept to the Storage IO system.

    4. Re:FreeBSD network stack by phoenix_rizzen · · Score: 3, Informative

      Google searches for "netmap" and "FreeBSD" will give you lots of information on pushing millions of pps through 900 MHz single-core machines. Netmap is also available on Linux. There's even a netmap-enabled version of IPFW that allows you to do packet filtering and routing completely in userspace, again will millions of pps. IPFW is also available on Linux, although I don't know if the netmap-enabled version is.

      Google searches for "openconnect" and "FreeBSD" will give you lots of information and blog posts from the Netflix guys about why they picked FreeBSD, and how it all works, including details on the networking.

      Google searches for "Adrian Chadd", or "RSS scaling", or similar terms will show you threads and posts on various FreeBSD mailing lists with information detailing a lot of the MSS/RSS work that's going into FreeBSD 11, and several projects that build off that. Those also have links to other information around sockets and similar.

      Google searches for "NUMA" and "FreeBSD" will bring up mailing list threads that cover the different projects being undertaken to improve the CPU affinity and thread locality and all that jazz.

      Sure, it would be nice if the OP had posted links to the info, but it's not like the information is secret or hard to find.

    5. Re:FreeBSD network stack by Lennie · · Score: 3, Informative

      The MPTCP stack for Linux isn't in mainline, but is much further ahead then the FreeBSD version (not sure if it that in mainline).

      --
      New things are always on the horizon
    6. Re:FreeBSD network stack by Anonymous Coward · · Score: 2, Informative

      Hi! adri here. I'm Adrian (adrian@freebsd.org) the ex-Netflix engineer who was doing this as his project whilst I was working there.

      I'm continuing to do it in my spare time now, as time and hardware permit.

      https://wiki.freebsd.org/NetworkRSS

      I'm working out the kinks in how IP fragments are correctly handled. It'll be more useful for real world deployments after that - unfortunately in the real world you have to deal with fragmented TCP and UDP; you can't just pretend it's not there.

      I'll continue chipping away at it after the Defcon weekend. It's nice to see no lock contention between CPUs when doing TCP and UDP workloads. The main thing that's left is NUMA awareness.

      I'll likely do up a memcached style example once this is done so we can really stress it. I'm pretty sure I can get a few million transactions a second out of it on 8-core single-socket hardware - without RDMA, without crazy vendor-specific products.

      The thing to keep in mind is that I'm doing this in my spare time now and it's not being funded. I've had 10G and soon 40G NICs donated to me and my employer will likely loan me some 8, 16 and 32 core machines to expand upon this.

  6. Re:This does pose the question: by HuguesT · · Score: 4, Informative

    I love FreeBSD, I support them financially every year, and I use it daily but it is not uniformly better than Linux. Hardware support, in particular, is very far behind. Two random examples:

    1- My NAS system does not recognise any USB storage when they are plugged in after boot (no hotplug). It does not support USB superspeed (USB 3.0) either (I have to boot in compatibility mode by disabling xHCI in the BIOS). This is a known issue with some Asus motherboards, still unfixed in 10.0
    2- FreeBSD does not install on some of my HP G6 servers. The kernel simply segfaults. I really wanted FreeBSD on this hardware, so I run it in a VM under Linux (using KVM). Has been running brilliantly for about 2 years now.

    Also security update in FreeBSD are really difficult. I haven't finished dealing with updating my ports since I moved from 9.2 to 9.3 last week.

    I have to say this though: when it runs, it runs really well.

  7. Re:Why not just use FreeBSD then? by HuguesT · · Score: 2

    May be multiple issues. Perhaps better OpenMP support ? maybe NUMA ? Maybe Linux has a better virtual machine infrastructure ? maybe hardware support.

  8. Re:Won't Happen by fuzzyfuzzyfungus · · Score: 2

    Why is the organizational separation between the kernel and the userland going to affect the quality of the TCP/IP stack?

  9. Corporate Contributions to OSS by phizi0n · · Score: 5, Insightful

    I don't understand why there's all these comments saying they should just use FreeBSD. There are many reasons to despise Facebook but their desire to improve the Linux networking stack is admirable. We should be encouraging corporations to contribute to OSS, not telling them to just use that other thing that is better in some ways but not others. Kudos to them for contributing back to the projects they use.

  10. Re:This does pose the question: by ci4 · · Score: 4, Interesting

    USB3 support in FreeBSD 10 is OK (bunch of external disks used for PC backup - speed was essential). No problem with hot-plug either. Ports upgrade is trivial (although I have switched to pkg-ng now). I really can't find why do you think that security updates are difficult either. I've got only one 9.2 system around which I at the moment am not bothered to upgrade.

  11. High perf SMP coding is in a category of its own by m.dillon · · Score: 5, Informative

    Designing algorithms that play well in a SMP environment under heavy loads is not easy. It isn't just a matter of locking within the protocol stack... contention between cpus can get completely out of control even from small 6-instruction locking windows. And it isn't just the TCP stack which needs be contention-free. The *entire* packet path from the hardware all the way through to the system calls made by userland have to be contention-free. Plus the scheduler has to be able to optimize the data flow to reduce unnecessary cache mastership changes.

    It's fun, but so many kernel subsystems are involved that it takes a very long time to get it right. And there are only a handful of kernel programmers in the entire world capable of doing it.

    -Matt

  12. Re:This does pose the question: by jdew · · Score: 2

    What asus board? I've got an asus crosshair iv formula and the usb3 has been working fine in both 9.2 and 9.3. The aibs driver doesn't support this board yet though.

    as for security updates:
    freebsd-update fetch; freebsd-update install

    and port updates:
    portsnap fetch update; portmaster -gda

    So not sure what the issues are here.

  13. Re:Why not just use FreeBSD then? by dr.Flake · · Score: 2

    Ye gods I'm so tired of this 'joke' cropping up every so often. It wasn't really that funny 12 years ago, either.

    Actually,

    i still find the dying remark funny.
    And the more time has passed, the funnier it gets. As still being around and still being a relevant OS proves how silly the study was at the time.

    So i hope we can still laugh about it in 30 years or so.

    --
    Why are other peoples sig's always more witty ???
  14. Re:Will Happen by phoenix_rizzen · · Score: 2

    Except when you start talking about netmap. :) That's a userspace network stack that can push millions of pps, on sub-GHz systems.

    There's even a netmap-enabled version of the IPFW packet filter that runs in userspace, filtering millions of pps on sub-GHz systems.

    And there's an applications ecosystem starting to grow around netmap that keeps all network-related packet processing in userspace.

    As a twist, netmap and IPFW are also available on Linux, and provide better performance than the in-kernel network stack and iptables. :)

  15. Re:This does pose the question: by TheRaven64 · · Score: 4, Interesting

    I'm pretty sure that Facebook buys enough hardware that they can afford to write drivers for anything they're missing and demand FreeBSD support from vendors for their next round of purchases. Netflix already does this (they won't buy any hardware that doesn't have vendor support for FreeBSD), as do a few other companies, and so a number of NIC vendors (particularly in the 10G/40G space) are now putting quite a bit of effort into their FreeBSD drivers.

    --
    I am TheRaven on Soylent News
  16. Re:This does pose the question: by Fweeky · · Score: 4, Interesting

    pkgng's made port upgrading much less burdensome - even fairly complex dependency changes can be handled automatically as of 1.3, and the official package repositories are a lot more useful now. They even have stable security-fix-only branches.

    I still make my own customised builds, but I make binary packages in an isolated jail using poudriere. 99% of upgrades are a matter of updating its ports tree, running rebuild-packages, and running pkg upgrade on all my machines.

    You couldn't pay me to go back to portupgrade/portmaster/portmanager.

  17. Re:Won't Happen by TheRaven64 · · Score: 4, Interesting

    Actually, that's not always true. FreeBSD ships with netmap, which allows you to talk to the network hardware directly from userspace. A significant chunk of the DNS root zone is served by FreeBSD boxes using a completely custom TCP/IP stack on top of netmap. There's a paper at this year's SIGCOMM about building specialised network stacks in this infrastructure.

    If you're talking about the FreeBSD TCP/IP stack, then libuinet allows running it entirely in userspace.

    These might not be the ones that Facebook is interested in, but a significant amount of their workload could be sped up by using the work described in the SIGCOMM paper...

    --
    I am TheRaven on Soylent News
  18. Re:This does pose the question: by lgw · · Score: 4, Informative

    If you have 1 million Linux machines deployed, with full Linux-specific software stacks on each, it's cheaper and easier (and most likely faster) to fix the problems you see in Linux than to move the fleet to a new OS.

    Facebook's dev shop culture is all about banging out code as fast as possible for the problem in front of them, then moving on. Forward planning isn't really the thing there, from what I hear (though I think you're no longer discouraged from testing your code before it goes live, that's a recent change). Moving to BSD might well be a better long term plan, but it would take years to get there and they don't really think on that timescale, from the rumors I hear anyhow.

    --
    Socialism: a lie told by totalitarians and believed by fools.
  19. Re:This does pose the question: by Pieroxy · · Score: 4, Insightful

    Don't you think it's easier and cheaper to optimize the network stack of Linux rather than writing tons of hardware drivers for FreeBSD? Hardware which, most of the time, will be undocumented. Furthermore, when you change your servers, yay, more drivers to write...

  20. Re:Remember Microsoft Windows? by Dahan · · Score: 2

    It used the FreeBSD networking code. This doesn't mean windows is fast and it's sort of specious. BSD has tricks in the Kernel to make I/O faster that pretty much anything else.

    No it didn't. A few utilities that nobody used (e.g., the commandline ftp.exe, which doesn't even support PASV mode) were ported from BSD (not even FreeBSD), but the TCP/IP stack in Windows was not from BSD.

  21. Re:This does pose the question: by Score+Whore · · Score: 2

    Facebook buys custom servers, so will be 100% documented. Also they are of the vanity free variety lacking any bolted on bits added strictly to make the numbered list of features on the side of the box longer. I suspect that the only thing they are going to care about are disks and nics. Sounds cards, video cards, random USB hardware, bluetooth, none of that matters to them at all. These are datacenter housed pieces of equipment.

  22. Re:This does pose the question: by lgw · · Score: 2

    Well, I only know this stuff second hand, but Facebook's dev motto used to be "go fast and break stuff". They had a very hard time hiring senior devs for some reason. Recently, they've had a bit of a (recruiting world) push to let people know that their new motto is "go fast with stable infra". I hear there used to be a banner on the wall proclaiming "don't test, just ship", but to be fair I haven't seen that myself.

    --
    Socialism: a lie told by totalitarians and believed by fools.
  23. Re:Why not rebuild your ports? by Anonymous Coward · · Score: 5, Informative

    If you are managing a bunch (more than 2) FreeBSD machines building ports on the live system is wrong and counterproductive. In this kind of scenario you really should setup a package building machine and distribute your own binary packages to them. With pkgng this really works well.

    This applies if you don't want to use the official binary packages (for example because you need different options from the defaults, or you tweak some ports)

  24. Re:Why do you think they will contribute back? by phizi0n · · Score: 2

    The job posting specifically says they prefer to have the code in mainline.

  25. Re:Remember Microsoft Windows? by phoenix_rizzen · · Score: 2

    Windows NT used a STREAMS-based networking stack, culled from some other UNIX (not directly, but using the concepts and frameworks), not a BSD-derived networking stack.

    I have no idea how the DOS-based Windows networking stack developed. But it wasn't pulled from any BSD.

    A few command-line utilities (ftp.exe is the most common cited one) were pulled from BSD sources, though.

  26. It's not just NetMap that brings performance ... by Anonymous Coward · · Score: 3, Informative

    FreeBSD also includes an alternative to select/poll called kqueue that allows it to scale client connections massively with minimal performance degradation. Linux introduced epoll as a work-alike, but it has some drawbacks ...

    http://www.eecs.berkeley.edu/~sangjin/2012/12/21/epoll-vs-kqueue.html

    What's a massive scale? WhatsApp, recently acquired by Facebook, uses FreeBSD and Erlang to power it's service offerings. They sustain over 2 million simultaneous client connections per FreeBSD server ...

    http://blog.whatsapp.com/196/1-million-is-so-2011

    I wouldn't be surprised if the internal comparison between Linux and FreeBSD network features/performance was fueled by feedback from their new subsidiary.

    FreeBSD also works very closely with the Nginx community. If you look at the dev mailing list, you will see a fair amount of kernel level dev work sponsored by companies that use nginx on top of FreeBSD. This constant tuning keeps nginx consumers loyal to FreeBSD for obvious reasons. There is no wonder why this combination was selected by NetFlix to power their new content delivery network.

  27. Re:This does pose the question: by gangien · · Score: 2

    I interviewed there, and there and had a dude with under a year of experience out of college interview me. and of course, give a horrible interview. I'm pretty sure that's the reason I didn't get the offer (of course who knows, maybe other reasons). they have this idea that an engineer is an engineer, which I think is good. to an extent, but in the end it's a bullshit philosophy.

  28. Re:This does pose the question: by CrankyFool · · Score: 2

    Two comments:

    1. Their internal motto was, in fact, "go fast and break stuff"; I know this first-hand because I talked with them about that at my interview back in February, where they mentioned that they've changed to "go fast and be bold" because, in fact, they were trying to lower incidents of availability hits;

    2. 20+ years of tech industry experience here, and I was totally ready to be interviewed by some snot-nosed kid. What I got instead was an interview panel whose average tech industry tenure was around 17 years. I was, uniformly, impressed with the caliber of the people I met with there -- they verged from "pretty decent" in one case, to "pretty great" in all but two other cases, to "I'd take a $10K pay cut to work with this person" for the last two people. I was pretty surprised, and delighted.

  29. Re:This does pose the question: by TheRaven64 · · Score: 2

    NIC drivers interact directly with the network stack only via a set of well-defined interfaces (in FreeBSD at least), and writing the drivers is a one-shot operation because for new purchases they can demand FreeBSD drivers and support from their vendors (as Netflix, Verisign, and others do already). In the networking space, enough big companies are already demanding FreeBSD support for the high-end cards that the drivers are already likely to exist, so the missing ones are likely to be other things.

    --
    I am TheRaven on Soylent News
  30. Re:This does pose the question: by Mdk754 · · Score: 2

    Do you know why Linux seems to do so much better on VM NIC performance?

    Honestly curious, not trying to troll. In my ESXi tests I can manage more than double the throughput on a Linux VM compared to a BSD VM. Is there just crappy paravirtual NIC support in BSD?

    Tests were done with IPFIre and pfSense on two different ESXi boxes, and going to virtual client/server on etiher end of the router to avoid gigabit wire speed limits. On my slower box, i got 6xxMB/s with pfsense and 2.xxGB/s with IPFire. Minimal speed difference in either OS when using vmnet3 vs e1000. e1000 offered minimal throughput increase, vmxnet3 offered minimal host CPU usage decrease.