Jens Axboe On Kernel Development
BlockHead writes "Kerneltrap.org is running an interview with Jens Axboe, 15 year Linux veteran and the maintainer of the linux kernel block layer, 'the piece of software that sits between the block device drivers (managing your hard drives, cdroms, etc) and the file systems.' The interview examines what's involved in maintaining this complex portion of the Linux kernel, and offers an accessible explanation of how IO schedulers work. Jens details his own CFQ, or Complete Fair Queue scheduler which is the default Linux IO scheduler. Finally, the article examines the current state of Linux kernel development, how it's changed over the years, and what's in store for the future."
FreeBSD dispensed with them altogether years ago...
Character devices only, thank you very much.
*Duck*
In Soviet Washington the swamp drains you.
"the piece of software that sits between the block device drivers (managing your hard drives, cdroms, etc) and the file systems.'"
That sounds REALLY hard. I'd be more interested if there's a development strategy he could recommend re: complex development projects.
stuff |
I thought the title was: Ewe Boll On Kernel Development...
If core changes of such magnitude are no longer sufficient to merit a dev branch or even a major point release, why bother with the "2.6" designation at all? Just pull a Solaris and call the next release "Linux 20" or "Linux XX."
-Isaac
I am not a lawyer, and this is not legal advice. For Entertainment Purposes Only.
I did a double take when I saw this, as Jens was an exchange student at my high-school way back when. Small internet.
An excellent read!
Something exciting about delving in the low level logic that gives you the feeling that there's always something more to learn !
I guess always being two steps behind is the motivation that makes it all worth while.
15 year Linux veteran and the maintainer of the linux kernel block layer,...
In the interview he says he is now 30 years old. Wow that means he started working in Linux at the age of 15 - a real prodigy. A very interesting interview.
Btw, it is nice that kerneltrap.org has finally had a make over. The earlier website design looked rather drab.
Linux Help
for all things on Linux
I wonder, if the originating process' priority is taken into account at all... It has always annoyed me, that the "nice" (and especially the idle-only) processes are still treated equally, when it comes to I/O...
In Soviet Washington the swamp drains you.
Anticipatory is, according to my menuconfig:
.config
The anticipatory I/O scheduler is the default disk scheduler. It is
generally a good choice for most environments, but is quite large and
complex when compared to the deadline I/O scheduler, it can also be
slower in some cases especially some database loads.*
Anticipatory is also preselected with a fresh
JA: Is there anything else you'd like to add? Jens Axboe: Share the code! :)
http://askaralikhan.blogspot.com/
Am I the only one that misread that as "an interview with Jens Axboe, 15 year old Linux veteran" ?
Is this the part of the kernel that's responsible for making systems really slow during extended disk writes, while the CPU utilization is minimal?
Plugging itself is a mechanism to slightly defer starting an IO stream until we have some work for the device to do. You can compare it to inserting the plug in the bathtub before you fill it with water, the water will not flow out of the tub until you remove the plug again.
At risk of starting a holy war, is there any reason why one approach would be superior? And do they lend themselves to different methods of scheduling? In TFA, Axboe talks about [1] the scheduling mechanism used in later versions of the 2.6 kernel series, which alleviates a problem that I (and most other people, probably) have run into before.
I'm curious, because although I don't use any of the 'real' BSDs very often -- I spend most of my time (at home, anyway) using either Mac OS X, which uses the Mach/XNU kernel (which is derived from 4.3BSD, although I don't know if the I/O scheduler has been rewritten since then), or Linux with the 2.6 kernel, and it seems to me that OS X's disk I/O leaves something to be desired compared to Linux's.
Does BSD handle I/O differently in some fundamental fashion than Linux? It sounds like, by eliminating block devices, that they basically remove the kernel from doing any re-ordering or caching of data, which makes things "safer" (in the event of a crash) but seems like it would have big performance penalties when using drives that aren't very smart, and don't do a lot of caching and optimization on their own. It seems like getting rid of I/O scheduling altogether is a stiff price to pay for "safety."
[1] (quoting because there doesn't seem to be anchors in TFA)
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
I think it would be more correct to say:
[His] is the the part of the kernel that's responsible for making systems slightly less slow during extended disk writes, while the CPU utilization is minimal.
And even that's not quite true, where the scheduler really comes into play is when you have two or more processes trying to access the disk at the same time. During an extended, sustained read or write, the scheduler probably just needs to stay the hell out of the way and pass data as fast as it can.
You could also say, that as a secondary priority, he's responsible for keeping the CPU utilization minimal, during those disk writes...
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
As a native English speaker, comfortable with Spanish and aware of the basics of French (so I'm not entirely uneducated), I am entirely unequipped to reason the pronunciation of "Jens Axboe." Can someone help me out?
My Freakin Blog
Back in school we pronounced it with a "y" sound for the "j": "Yens" rhymed with "mens." Now, as to weather that was actually the correct pronunciation or merely something close enough that he didn't bother correcting us; I couldn't say.
Those guys, including Linus, are just fucking around with shit as a live experiment with users. They've already complained since 2.3 that people don't test the dev kernels enough. Not even bothering to attempt to prove if it works, they release the van Riel VM and it totally screws users for about six point releases before they remove it. We also get Linus saying he'll screw with the ABI intentionally to mess up binary-only modules and no real direction in terms of overall architecture stabilization.
I disagree. It's not too big. They just can be bothered to manage it properly.
Thank you very much. Much of this article is informative, technical and really, really nerdy. I for one sit through dupes and rubbish like today's meaningless benchmarking of differing minor kernel versions in the hope of reading articles like this.
BTW, does anyone have a good set of benchmarks of the performance of different IO schedulers when running one or two or three IO intensive tasks, when running one intensive and many small tasks, etc.? That would actually help me decide whether to rebuild my kernel with CFQ.
Also, ionice would have made my old machine much more usable when doing backups... Oh well.
# cat
Damn, my RAM is full of llamas.
Now, as to weather that was actually the correct pronunciation or merely something close enough ...
;-)
Close enough.
Are there any hard metrics on what the performance advantages are of various schedulers, under typical load conditions?
Reading TFA piqued my interest into I/O scheduling and I've been doing some reading on it, and it seems like there are several competing schools of thought, of which Axboe (and potentially the Linux kernel developers generally) are only one.
An alternative view, such as this from Justin Walker (a Darwin developer) on the darwin-kernel mailing list, holds that it's not worthwhile for the OS kernel to do much disk scheduling, since "the OS does not have a good idea of the actual disk geometry and other performance characteristics, and so we [kernel developers] leave that level of scheduling up to the controllers in the disk drive itself. I think, for example, that recent IBM drives have some variant of OS/2 running in the controller. Since the OS knows nothing about heads, tracks, cylinders for modern commodity disks, it's futile to try to schedule I/O for them." (written Mar 2003)
Axboe seems to acknowledge that this may sometimes be the case, because they do have the 'non-scheduling scheduler,' which he recommends only for use with very intelligent hardware. However, it seems like some people think that commodity drives are already 'smart enough' to do their own scheduling.
It seems like determining which approach was superior would be relatively straightforward, and yet I've never seen it done (although maybe I'm just not looking in the right places). Anecdotally, I'm tempted to agree with Axboe, since it seems like, when doing things where several processes are all thrashing the disk simultaneously, my Linux machine feels faster than my OS X one, but this is by no means scientific (they don't have the same drives in them, not working with the same datasets, etc.).
On what drives, and under what conditions, is it advantageous to have the OS kernel perform scheduling, and on which ones is it best just to pass stuff to the drive and let the controller do all the thinking?
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
I am not proficient in sound-writing. (Or whatever it is called..)
But since Jens is a Dane, like myself I'd give it a shot.
Yens Aksbo
Where Yens is pronounced with the pressure on the e.
Yêns
And boe is pronounced without the e, and with the pressure on the a.
âksbo
Hope this helps.
I mean, if you are following bleeding edge kernels, and complaining that they aren't as stable as you'd like. Why not just follow a vendors kernel?
And people still wonder why major vendors slowly but steadily fully drop customer distributions... RedHat dropped its stuff and moved it into Fedora, SuSE dropped theirs and moved it into 'OpenSuSE'. Why?
My theory is simple: the model doesn't plain out suck from an end-users point of view, but it also puts a massive overhead on companies who try to actually run their business on Linux. It keeps amazing me, time and time again, how people are so easy with conclusions like "So run a vendor kernel" without seemingly even the will to realize that the burden on the end user also applies to the vendor. Big difference being that while it may simply annoy the enduser, it can really hurt the company.
Linux had better get its act together otherwise I foresee it sinking right back from the depths it came from while its being fully overrun by massive discussions in a desperate attempt to make it more commonly appealing again. Because if I compare both Linux and Solaris at this point I see a lot more stable software being released by Sun. Its free, its more reliable and if I wish to fork their software to run something on my own it doesn't put the same massive amounts of overhead on my shoulders. Granted, don't get me wrong here, naturally its not perfect. Even Sun has its flaws, but right now one of the most heard complaints is thay they promise and only deliver at a much later moment. It protects stability and reliability, it doesn't really help with the people who were actually interested in those features.
And then there's ofcourse also the BSD tree. I can't fully comment on these since my experience is very limited. But here too you see some "cutting edge" environments (if you want it to be) like FreeBSD but also environments which take it much slower and rely on robustness and stability (OpenBSD for example).
So, long way to basicly say that I don't think your comment is fair. Please try to think beyond your own little space. What applies to the enduser applies to the vendor. And the latter is a very important factor when it comes to boosting Linux.
Jens is NOT pronounced "Djens". "J" is pronounced as a Palatal approximant in Danish - just like "y" in English. Yens is somewhat more correct, but the "e" has to be pronounced like the IPA [æ]. Danish is not logic at all. If it was, "Jens" would be spelled with a "æ". Take a look at Jens.
IPA: [jæns]
Axboe is more complicated:
IPA transcription of Axboe would be something like: Open back unrounded vowel + [ksbø]
(I can't get the IPA sign for "Open back unrounded vowel" to display in Slash)
"It sounds like, by eliminating block devices, that they basically remove the kernel from doing any re-ordering or caching of data, which makes things "safer""
No; FreeBSD's shifted the buffer cache away from individual devices and into the filesystem/VM, where it caches vnodes rather than raw data blocks. The IO queue (below all this block/character/GEOM stuff) is scheduled using a standard elevator algorithm called C-LOOK. It's showing it's age in places, and there's been some effort towards replacing/improving it, making it pluggable etc (e.g. Hybrid); sadly it's a tricky problem to solve properly. See this recent thread.
That's awesome! Thanks.
My Freakin Blog
I made a mistake. In this case (Axboe) "oe" is just pronounced "o". This is the Close-mid back rounded vowel IPA: [o].
seriously, 2.6.19 and above is unusable on my computer, because the i2c driver crashes ipw2200
not enough testing is being done. you people are ruining your own kernel
I was a little disappointed when he said filesystems like Reiser4 and ZFS don't affect the block layer. I'm not sure about ZFS, either, but I do know that Reiser4 can do stuff above and beyond what the block device layer can do, these days.
How do I know? Why, it's on the Namesys webpage!
Hi John!
That is correct, like a "y", rhymes with "mens". I saw another question on the lastname, I typically tell foreigners that it is pronounced ax-bow. Europeans often think the 'oe' is like the Danish "ø", however that is not the case.
He there, good to see you still exist! So, is kernel.dk always STFU or is that just up for /.ing?