How Google Uses Linux

← Back to Stories (view on slashdot.org)

Posted by timothy on Saturday November 7, 2009 @08:39AM from the rebasing-not-freebasing dept.

postfail writes 'lwn.net coverage of the 2009 Linux Kernel Summit includes a recap of a presentation by Google engineers on how they use Linux. According to the article, a team of 30 Google engineers is rebasing to the mainline kernel every 17 months, presently carrying 1208 patches to 2.6.26 and inserting almost 300,000 lines of code; roughly 25% of those patches are backports of newer features.'

12 of 155 comments (clear)

Min score:

Reason:

Sort:

Release the patches already by Dice · 2009-11-07 08:52 · Score: 5, Interesting

They monitor all disk and network traffic, record it, and use it for analyzing their operations later on. Hooks have been added to let them associate all disk I/O back to applications - including asynchronous writeback I/O.
I. Want. This.
Re:Togh by MichaelSmith · 2009-11-07 09:30 · Score: 2, Interesting

TFA does suggest though that google have gotten themselves into a horrible mess with their local changes and would be better off by offloading their stuff to the community and taking properly integrated releases.

--
http://michaelsmith.id.au
Is it worth it? by ToasterMonkey · 2009-11-07 09:38 · Score: 2, Interesting

The whole article sounds so painful, what do they actually get out of it?

Google started with the 2.4.18 kernel - but they patched over 2000 files, inserting 492,000 lines of code. Among other things, they backported 64-bit support into that kernel. Eventually they moved to 2.6.11, primarily because they needed SATA support. A 2.6.18-based kernel followed, and they are now working on preparing a 2.6.26-based kernel for deployment in the near future. They are currently carrying 1208 patches to 2.6.26, inserting almost 300,000 lines of code. Roughly 25% of those patches, Mike estimates, are backports of newer features.

In the area of CPU scheduling, Google found the move to the completely fair scheduler to be painful. In fact, it was such a problem that they finally forward-ported the old O(1) scheduler and can run it in 2.6.26. Changes in the semantics of sched_yield() created grief, especially with the user-space locking that Google uses. High-priority threads can make a mess of load balancing, even if they run for very short periods of time. And load balancing matters: Google runs something like 5000 threads on systems with 16-32 cores.

Google makes a lot of use of the out-of-memory (OOM) killer to pare back overloaded systems. That can create trouble, though, when processes holding mutexes encounter the OOM killer. Mike wonders why the kernel tries so hard, rather than just failing allocation requests when memory gets too tight.
Ooooh... efficiency.. I'm curious what the net savings is.. compared to buying more cheap hardware.

So what is Google doing with all that code in the kernel? They try very hard to get the most out of every machine they have, so they cram a lot of work onto each.
(30 * kernel engineer salary) / (generic x86 server + cooling + power) = ?
1. Re:Is it worth it? by dingen · 2009-11-07 09:57 · Score: 5, Interesting
  
  Ooooh... efficiency.. I'm curious what the net savings is.. compared to buying more cheap hardware.
  We're talking about Google here. They have dozens of datacenters all over the globe, filled with hundreds of thousands of servers. Some estimate even a million servers or more.
  So lets assume they have indeed a million servers and they need 5% more efficiency out of their server farms. Following your logic, it would be better to add 50,000 (!) cheap servers which consume space, power and require cooling and maintenance, but I'll bet you paying a handful of engineers to tweak your software is *a lot* cheaper. Especially since Google isn't "a project" or something. They're here for the long run. They're here to stay and in order to make that happen, they need to get the most from their platform as possible.
  
  --
  Pretty good is actually pretty bad.
2. Re:Is it worth it? by LordNimon · 2009-11-07 10:19 · Score: 4, Interesting
  
  Porting patches from one kernel version to another is not innovation.
  
  A while back I got an invitation to work for Google as a kernel developer. I declined to interview, because I already had a job doing just that. This article makes me glad I never accepted that offer. I feel sorry for those kernel developers at Google. Porting all that code back-and-forth over and over again. Now *that's* a crappy job.
  
  --
  And the men who hold high places must be the ones who start
  To mold a new reality... closer to the heart
3. Re:Is it worth it? by Taur0 · 2009-11-07 10:30 · Score: 2, Interesting
  
  I really hope you're not an engineer, because your solution to a problem should never be: "Screw the most efficient solution, we'll just go out and buy more and waste more energy!" These incremental increases in efficiency will drastically change a product overtime, look at cars for example. The countless engineers working at GM, Toyota, Ford, etc. could have easily said: "meh whatever, just make them buy more gas". The modern combustion engine is only about 30% efficient, but that's far better than when the combustion engine was first thought of, which was somewhere around 0.4%.
Low memory conditions by jones_supa · 2009-11-07 09:45 · Score: 5, Interesting

Google makes a lot of use of the out-of-memory (OOM) killer to pare back overloaded systems. That can create trouble, though, when processes holding mutexes encounter the OOM killer. Mike wonders why the kernel tries so hard, rather than just failing allocation requests when memory gets too tight.
This is something I have been wondering too. Doesn't it just lead to applications crashing more often than them normally reporting they cannot allocate more memory?
Re:Does Google give coade back by marcansoft · 2009-11-07 10:50 · Score: 4, Interesting

Andrew Morton, Google employee and maintainer of the -mm tree, contributed the vast majority of the changes filed under "Google" (and most of those changes aren't Google-specific - Andrew has been doing this since before he was employed there). If you subtract Andrew, Google is responsible for a tiny part of kernel development last I heard, unfortunately.
Re:The Win32 Way by Sam+Douglas · 2009-11-07 11:20 · Score: 2, Interesting

In Unix if malloc returns null then the memory allocation failed and you don't have the memory. A well written program should check that. Overcommitting memory can have efficiency advantages, but things can also turn out badly. Linux has heuristics to determine how much to overcommit the memory, or it can be disabled entirely.
http://utcc.utoronto.ca/~cks/space/blog/unix/MemoryOvercommit
http://utcc.utoronto.ca/~cks/space/blog/linux/LinuxVMOvercommit
Re:Togh by grcumb · 2009-11-07 12:30 · Score: 2, Interesting

The Linux dev model is NOT something to be proud of.
Indeed:
"The Linux dev model is the worst form of development, except for all those other forms that have been tried from time to time." - Winston Churchill
... Oh wait, no. That was me, actually.
Holy humour-impaired down-modding, Batman! How is the above a troll?
For those too dense to get the joke: I actually agree that the Linux development model has significant weaknesses. It's just that, despite its shortcomings, it actually has proven workable for many years now.
I'm not implying that there aren't better community-driven coding projects in existence. Nor do I want to suggest that critiquing the community is unwarranted (or even unwanted). It's just that, for all its warts, it has produced consistent results over the years.

--
Crumb's Corollary: Never bring a knife to a bun fight.
Real example... by Fished · 2009-11-07 13:42 · Score: 4, Interesting

Back in the 90's, we had a customized patch to Apache to make it forward tickets within our intranet as supplied by our (also customized) Kerberos libraries for our (also customized) build of Lynx. It all had to do with a very robust system for managing customer contacts that ran with virtually no maintenance from 1999 to 2007--and I was the only person who understood it because I wrote it as the SA--when it was scrapped for a "modern" and "supportable" solution that (of course) requires a dozen full-time developers and crashes all the time.
Not really bitching too much, because that platform was a product of the go-go 90's, and IT doctrine has changed for the better. No way should a product be out there with all your customer information that only one person understands. But it was a sweet solution that did its job and did its job well for a LONG time. Better living through the UNIX way of doing things!
But, anyway, I never bothered to contribute any of the patches from that back to the Apache tree (or the other trees) because they really only made sense in that particular context and as a group. If you weren't doing EXACTLY what we were doing, there was no point in the patches, and NOBODY was doing exactly what we were doing.

--
"He who would learn astronomy, and other recondite arts, let him go elsewhere. " -- John Calvin, commenting on Genesis 1
Re:Does Google give coade back by Anonymous Coward · 2009-11-07 18:05 · Score: 1, Interesting

Hiring someone to keep doing what they were already doing doesn't make you a kernel contributor.
I disagree with this statement. He's being paid to work on the kernel. What's the difference?