Facebook Seeks Devs To Make Linux Network Stack As Good As FreeBSD's
An anonymous reader writes Facebook posted a career application which, in their own words is 'seeking a Linux Kernel Software Engineer to join our Kernel team, with a primary focus on the networking subsystem. Our goal over the next few years is for the Linux kernel network stack to rival or exceed that of FreeBSD.' Two interesting bullet points listing "responsibilities": Improve IPv6 support in the kernel, and eliminate perf and stability issues. FB is one of the worlds largest IPv6 deployments; Investigate and participate in emerging protocols (MPTCP, QUIC, etc) discussions,implementation, experimentation, tooling, etc.
Why not use FreeBSD? It's already there and at least as good as linux. Or have they perhaps hung themselves on systemd?
It might be a silly question, but why don't they just use FreeBSD in that case?
Look, this is FreeBSD ... why not just take their damned code?
It's not like you're not allowed to do that. That's what is great about the BSD license.
If FreeBSD's network stack is what you aspire to, why reinvent the wheel?
Lost at C:>. Found at C.
Insulting their work might not be the right way to get the best Linux kernel network engineers to join your company.
What makes the FreeBSD network stack superior?
Why is the organizational separation between the kernel and the userland going to affect the quality of the TCP/IP stack?
It used the FreeBSD networking code. This doesn't mean windows is fast and it's sort of specious. BSD has tricks in the Kernel to make I/O faster that pretty much anything else.
Need Mercedes parts ?
I don't understand why there's all these comments saying they should just use FreeBSD. There are many reasons to despise Facebook but their desire to improve the Linux networking stack is admirable. We should be encouraging corporations to contribute to OSS, not telling them to just use that other thing that is better in some ways but not others. Kudos to them for contributing back to the projects they use.
What effect does the userland have on the TCP/IP stack?
Designing algorithms that play well in a SMP environment under heavy loads is not easy. It isn't just a matter of locking within the protocol stack... contention between cpus can get completely out of control even from small 6-instruction locking windows. And it isn't just the TCP stack which needs be contention-free. The *entire* packet path from the hardware all the way through to the system calls made by userland have to be contention-free. Plus the scheduler has to be able to optimize the data flow to reduce unnecessary cache mastership changes.
It's fun, but so many kernel subsystems are involved that it takes a very long time to get it right. And there are only a handful of kernel programmers in the entire world capable of doing it.
-Matt
Except when you start talking about netmap. :) That's a userspace network stack that can push millions of pps, on sub-GHz systems.
There's even a netmap-enabled version of the IPFW packet filter that runs in userspace, filtering millions of pps on sub-GHz systems.
And there's an applications ecosystem starting to grow around netmap that keeps all network-related packet processing in userspace.
As a twist, netmap and IPFW are also available on Linux, and provide better performance than the in-kernel network stack and iptables. :)
Actually, that's not always true. FreeBSD ships with netmap, which allows you to talk to the network hardware directly from userspace. A significant chunk of the DNS root zone is served by FreeBSD boxes using a completely custom TCP/IP stack on top of netmap. There's a paper at this year's SIGCOMM about building specialised network stacks in this infrastructure.
If you're talking about the FreeBSD TCP/IP stack, then libuinet allows running it entirely in userspace.
These might not be the ones that Facebook is interested in, but a significant amount of their workload could be sped up by using the work described in the SIGCOMM paper...
I am TheRaven on Soylent News
And why would it be such a bad thing, even if you had to? I maintain a handful of ports myself and people's complaining about rebuilding them always irks me — what am I missing?
There are, of course, precompiled binary packages for almost all the ports (where licensing allows redistribution of such binaries). But I don't use them myself — building everything from source. Why not?
In Soviet Washington the swamp drains you.
The job posting specifically says they prefer to have the code in mainline.
Hardware support, in particular, is very far behind
Amusingly, I had the opposite problem years back. The wireless drivers and stack were better on BSD, while on Linux they were hard to find, and often you ended up having to use ndiswrapper which was a nightmare (often resulting a decision between "do I upgrade my kernel and fix X, or keep my kernel and have working wifi").
Just taking a quick peek, it looks like USB3.0 works, but it depends on your host-controller whether it's supported or not.
FreeBSD also includes an alternative to select/poll called kqueue that allows it to scale client connections massively with minimal performance degradation. Linux introduced epoll as a work-alike, but it has some drawbacks ...
http://www.eecs.berkeley.edu/~sangjin/2012/12/21/epoll-vs-kqueue.html
What's a massive scale? WhatsApp, recently acquired by Facebook, uses FreeBSD and Erlang to power it's service offerings. They sustain over 2 million simultaneous client connections per FreeBSD server ...
http://blog.whatsapp.com/196/1-million-is-so-2011
I wouldn't be surprised if the internal comparison between Linux and FreeBSD network features/performance was fueled by feedback from their new subsidiary.
FreeBSD also works very closely with the Nginx community. If you look at the dev mailing list, you will see a fair amount of kernel level dev work sponsored by companies that use nginx on top of FreeBSD. This constant tuning keeps nginx consumers loyal to FreeBSD for obvious reasons. There is no wonder why this combination was selected by NetFlix to power their new content delivery network.
There are a few "hacks" that do that in linux as well. Basically the network driver is in userland, with a kernel shim to handle DMA and IRQ which isn't available to userland. (in fact, I use that same mode to deal with broadcom SoC's -- not for network traffic, just to configure and monitor) There are advantages to pulling packets direct to userland.
How much of Facebook's code is not platform-independent already? Most of their web infrastructure is PHP with a custom compiler. The compiler may need some tweaks, but the PHP code that it compiles should be completely independent. Drivers are a different issue, but if they're employing a load of kernel devs it's a lot easier to write a few missing drivers than make large and invasive changes to the network stack.
I am TheRaven on Soylent News
Out of interest, how many cores are you using? I've not used Solaris for... a long time, but their network stack was redesigned a little while ago to do each layer in a different thread with lockless ring buffers between them all. I'd imagine latency might suffer a bit, but throughput should be insane with enough hardware contexts. On something like the UltraSPARC Tx series, I'd expect it to outperform FreeBSD, but I've never seen numbers.
I am TheRaven on Soylent News
...slightly less, now.
Which works great if you're writing all your own software, or are able to rely on nothing but open source. VERY few companies have that luxury, and FreeBSD/OpenBSD are a non-starter for anything you want vendor support on. RHEL and SLES on the other hand are almost universally supported by hardware and software vendors alike.
I remember seeing at some point numbers. It didn't impress in a single thread, but could easily saturate a 10Gb link in multi-threaded tests. They tested an FTP server on a T2plus. Regarding cores, we have anything from dual UltraSPARC IIIi to T4 based systems including some M-class. I believe the T3-4 has the highest number of cores. It should be 64 cores and 512 threads, but a single Solaris instance can only see 256. I believe that the M9000 and M9000-64 should have the same problem, but the biggest M series I've worked with is M8000.
UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever ones.
My objection was not to merits or lack thereof of a particular OS, but to the practice of placing the burden of research on the audience (and opponents).
Whatever it is you are stating, should be backed by evidence. It is best to include the links with the statement being supported, but it can be tedious. So, links should be provided upon request — without any lip like "just google it yourself"...
In Soviet Washington the swamp drains you.