Con Kolivas Returns, With a Desktop-Oriented Linux Scheduler
myvirtualid writes "Con Kolivas has done what he swore never to do: returned to the Linux kernel and written a new — and, according to him — waaay better scheduler for the desktop environment. In fact, BFS appears to outperform existing schedulers right up until one hits a 16-CPU machine, at which point he guesses performance would degrade somewhat. According to Kolivas, BFS 'was designed to be forward looking only, make the most of lower spec machines, and not scale to massive hardware. i.e. [sic] it is a desktop orientated scheduler, with extremely low latencies for excellent interactivity by design rather than 'calculated,' with rigid fairness, nice priority distribution and extreme scalability within normal load levels.'"
Why would the summary omit this precious bit of information?
Great news :-) Now, will the kernel people with Mr. Torvalds at their head, restart the whole debate on pluggable schedulers. Since his scheduler, as he says, degrades beyond 16 CPUs, better options already exists for servers where I am guessing CFS is used. So, he may be back, but the road ahead is still as steep?
May I be the first to say "amen"? I've been very dissatisfied with the 2.6 kernel and its schedulers on the desktop, CFS in particular. CFS seems entirely braindead for desktop use compared to the older schedulers in 2.4 and yes, even 2.2.
A desktop machine needs to be, first and foremost, responsive. If it isn't, it's comparable to the cursor freezing and input taking several seconds to appear: on today's hardware, one might start to think "hey, did it freeze on me?" - completely unacceptable.
Maybe it can be chalked up to the non-priority of X and video at the kernel level; I don't know. Whatever it is, it used to be better, on very pathetic (133MHz) hardware, while doing a lot more (and when such hardware was not all that powerful anymore, as well).
My question is: is it in the kernel tree yet? Is this that 2.6.31 scheduler change I heard about earlier yesterday, or is it something Completely Different?
Oh yeah, and which other scheduler's, if any, did this guy write?
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
I smell another LKML flamewar coming....
This comment is fully compliant with RFC 527.
What is it?
BFS is the Brain Fuck Scheduler.
yeah...
This is news? Linux has been playing musical schedulers for years now. I've yet to be impressed by any of them, for any use, with any hardware.
My
I tripped over his site about 4 days ago, so this is old news to me (well 4 days old anyway). I didn't try adding his code to the kernel and doing a compile (I'm running Linux Freedom 2.6.31-rc8-git2 #1 SMP Fri Sep 4 19:22:35 MDT 2009 x86_64 GNU/Linux built with gcc version 4.4.1 (4.4.1) on Ubuntu 9.04) on a corei7-920. I might give it a try though. I also found out what BFS stood for. I might give it a try anyway.
Clearly, Desktop Linux and Server Linux have some things in common, but they also have different needs. I'm not intimately familiar with any kernel programming but I do have some basic understanding of how it all works and even I find it relatively easy to understand that the needs of a good and snappy desktop and those of reliable server are going to have some differences.
I think it is beyond time that some sort of kernel operating mode optimizations are enabled like this scheduler thing for desktop even if the defaults are for server.
While I think it's great for Linux to have more choice in schedulers, I don't understand Con's spec at all:
Hang on there, something's not right with that logic:
1) If you're forward looking, how can you not scale beyond 16 cores? We're already at 8 cores on home boxes.
2) And if you're forward looking only, then how come that you're looking backwards at lower spec machines?
Whoever wrote that piece just produced a sound bite that's logically meaningless.
Finally a worthy brainfuck program! ++!
(See http://en.wikipedia.org/wiki/Brainfuck)
Nae king! Nae laird! Nae yurrupiean pressedent! We willna be fooled again!
He's like the Brett Favre of linux kernel schedulers!
hmm.. wrong place to use football reference?
who am I kidding.. I don't watch football
Took me a while to figure out what "forward looking" means in this context, since "forward-looking scheduler" doesn't seem to be common terminology, and I assumed he wasn't talking about his grand forward-looking vision for schedulerdom.
Based on some previous arguments he's had, it sounds like he opposes the common heuristic of upping interactive process priority by keeping track of how long processes sleep--- processes that sleep a lot are probably I/O bound, and should get a priority boost so they can run on the (less frequent than for CPU-bound processes) occasions when they're ready. Kolivas wants schedulers to be forward-looking in the sense that they decide how to schedule without looking at process run history, by looking purely at who's ready to run, available timeslices, priorities, etc.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Who cares about his scheduler focus, while performance bottlenecks and especially his 'examples' are mostly I/O related. If a make -j 4 makes your quadcore machine choke, it's indeed a BRAINFUCKED coder who made it... CPU cycles should be plentifull available in between the I/O operations. Still wonder why he's able to get that much media attention... ... luckily nobody on LKML cares neither...
I for one, welcome our new Bloody Fucking Scheduler overlords!
The content of http://ck.kolivas.org/patches/bfs/bfs-faq.txt below, geeks wanna know
FAQS about BFS. v0.209
Why did I write it?
After years of using my old kernel and numerous hardware upgrades, I finally
had hardware that needed a newer kernel for drivers and to try out the newer
filesystems. Booting the mainline kernel was relatively reassuring in that
the scheduler behaviour was much better than what was in earlier kernels.
However, it didn't take long before I started being disappointed in that too.
Random stalls in mouse movements, keypresses, strange cpu distribution in
various workloads and unpredictable behaviour all around were exactly what I
was hoping had gone away. So I did what I vowed never to do, looked at the
code. After seeing it had grown into a monster of epic proportions I sat down
and thought about what was wrong. One of the key features of fairness and
interactivity that I always argued for were very simple semantics for how
cpu should be distributed, with guaranteed low latencies so that interactivity
was assured by design instead of bolted on. CFS in essence does that, but it
does something else too. It varies timeslice length to try and preserve some
deadline list and it determines cpu distribution based on a run/sleep
relationship. It also is designed to scale to monster proportion hardware
that the common man will never see. The whole sleep calculation thing is
exactly what I found was responsible for making varied behaviour under
different loads and relative starvation and unfairness. It's not a profound
effect in CFS and that's admirable. It just doesn't behave the way I feel
the scheduler should being forward looking only (not calculating sleep) and
it doesn't really make the most of a relatively lightly loaded machine without
many many cpus. So I threw it all out and wrote exactly the opposite.
What is it?
BFS is the Brain Fuck Scheduler. It was designed to be forward looking only,
make the most of lower spec machines, and not scale to massive hardware. ie
it is a desktop orientated scheduler, with extremely low latencies for
excellent interactivity by design rather than "calculated", with rigid
fairness, nice priority distribution and extreme scalability within normal
load levels.
Extreme scalability within normal load levels? Isn't that a contradiction?
For years we've been doing our workloads on linux to have more work than we
had CPUs because we thought that the "jobservers" were limited in their
ability to utilise the CPUs effectively (so we did make -j6 or more on a
quad core machine for example). This scheduler proves that the jobservers
weren't at fault at all, because make -j4 on a quad core machine with BFS
is faster than *any* choice of job numbers on CFS. See reverse scalability
graph courtesy of Serge Belyshev showing various job numbers on a kernel build
on a quad core machine. The problem has always been that the mainline
scheduler can't keep the CPUs busy enough; ie it doesn't make the most of
your hardware in the most common situations on a desktop! Note that the
reverse scalability graph is old; the scalability has improved since then.
Why "Brain Fuck"?
Because it throws out everything about what we know is good about how to
design a modern scheduler in scalability.
Because it's so ridiculously simple.
Because it performs so ridiculously well on what it's good at despite being
that simple.
Because it's designed in such a way that mainline would never be interested
in adopting it, which is how I like it.
Because it will make people sit up and take notice of where the problems are
in the current design.
Because it throws out the philosophy that one scheduler fits all and shows
that you can do a -lot- better with a scheduler designed for a particular
purpose. I don't want to use a steamroller to crack nuts.
Because i
Haven't run Linux as my personal OS since 2003 but I had a lot of time (pun intended) for CK's schedulers. Now a whole new generation of youngsters can finally learn what a _REAL_ LKML flamewar looks like ;-)
I always get a much smoother and more responsive X experience on FreeBSD. An extreme example: years ago I could run FreeBSD in vmware (headless, connect with a native X term ie WinX32) just like a native machine while Redhat is slow like a snail. Is this because FreeBSD is genuinely fast?
"i.e." should be used after a statement to explain it another way
Remove the [sic]
http://askville.amazon.com/define-correct-usage/AnswerViewer.do?requestId=5300847
it's in my head
Still some grudge towards Torvalds and Molnar? From the FAQ:
Are you looking at getting this into mainline?
LOL.
No really, are you?
LOL.
Really really, are you?
No. They would be crazy to use this scheduler anyway since it won't scale to their 4096 cpu machines. The only way is to rewrite it to work that way, or to have more than one scheduler in the kernel. I don't want to do the former, and mainline doesn't want to do the latter. Besides, apparently I'm a bad maintainer, which makes sense since for some reason I seem to want to have a career, a life, raise a family with kids and have hobbies, all of which have nothing to do with linux.
Reminds me of this XKCD.
I don't have 4096 CPUs, good job Con Kolivas!
CFS can't even cope with a CPU-bound application.
Who here runs Linux on anything with more than 16 cores? Why should everyone else get the shitty end of the stick just because of maybe a dozen institutes with deep pockets?
16 sounds like a ridiculously high number for a desktop but is it?
Already we have 4 core processes which have "soft" additional threads (Intel's HT for instance) and some people already have dual CPU desktop machines meaning they are already at the 16 CPU limit.
Roll on 12-18 months and we'll be seeing 8 core CPUs with 8 soft-cores as coming in on top end desktops. Roll forwards 3 years and you'll be seeing 32 core CPUs with 32 soft-cores which is where the scheduler breaks down.
So the problem here is that this is a brilliant optimisation for today and for pieces like the netbook market but won't be good for the desktop market long term.
With Linux looking to be strong in the netbook market however it does say that having a more efficient scheduler for that market would be a better idea than just optimising everything for the server side.
An Eye for an Eye will make the whole world blind - Gandhi
The FAQ:
Sorry, it's not the right tool for me so it's not worth me investing the time
in setting one up.
C'mon Con. DSCM is a great way to distribute forks of software. If you don't like git (I don't) there is a mercurial mirror of the linux kernel available and hosting a repository is dead easy. There are plenty of free options anyway. Or ask me.
http://michaelsmith.id.au
Maybe if someone were to hook this up with OSS in Ubuntu... we could be looking at a distro that's finally suitable for music production?
I only care about 200-800MHz single core ARM performance. When I do have a dual-core ARM, I'm only running Linux on one core in that situation. Not only am I am evil bastard that doesn't cared about desktop performance, like those nasty server-oriented kernel maintainers, I also don't care about server performance!
That said I think I like his scheduler for embedded. I may have to try the patch out at work and see how many apps and drivers choke because it exposes their races.
I could do without his emotional baggage in his BFS faq though.
“Common sense is not so common.” — Voltaire
Why have you put an editorial "sic" in there? "i.e." is perfectly valid in the context in which it was used, it's an abbreviation of the Latin, "id est", or "that is".
The quote, if read in a manner expanding the abbreviation, would read "...and not scale to massive hardware. That is, it is a desktop orientated scheduler..." I would probably have changed the full stop after "hardware" to a semicolon, but that's me.
Yeah, I had a sig once; I got bored of it.
If they really wanted maintainability they would have changed to microkernel architecture years ago.
A lot of options are available before compiling the kernel, couldn't the choice of scheduler be one of them? it wouldn't defenetly be a great enhancement for the portable platforms...
Does this mean that Flash will **finally** run pause-free on the linux desktop?
Linux minty 2.6.30-ck-bfs208 #1 SMP PREEMPT Sun Sep 6 12:22:52 CST 2009 i686 GNU/Linux
OK so i havnt compiled in a long time but i thought i'd give it a go, the performce/responsiveness was noticable on my eeepc 901 running linux mint7. I also later changed the other options he talked about including tickerless, 1000hz, preempt on. I cant complain - no crashes and flash and audio are better than before. Add a dose of opera 10 and youve got a sweet little netbook.
Sorry, even my 2008 desktop is a multiproc multicore numa machine now (opterons). Kolivas is just wrong. The linux O(1) scheduler behaviour is a major win not worth sacrificing for increasingly obsolete old computers.
AMD has 6 core right now and 12 in 2Q10. Intel and AMD will have 16 core by end of 2010 or early 2011. They are designed to run in multi-socket systems.
[RIAA] says its concern is artists. That's true, in just the sense that a cattle rancher is concerned about its cattle.
...is how many other scenarios there, have been, where someone had code for the kernel which was better than the default, but which got arbitrarily rejected by Linus out of hand. This might be a high profile case, but I'll be money that it's probably nowhere close to having been the only one.
The benevolent dictator model, when it works, is a good thing. However, Linus, like all of us, is human, and he's also been working on the kernel for a long time now.
There would have to have been times when he has made the wrong decisions, and something tells me that Con Kolivas represents one of them.
'I've done what I swore an oath to God two years ago to never do again; I've created "something that compiles". And in that purpose, I was a success. I've done this because, philosophically, I'm sympathetic to your aim. I can tell you, with no ego, this is my finest scheduler. If, on your journey, you should execute God, God will be assigned appropriate and fair system resources!'
doing his best to stir the pot and not let the food stick.
Sometimes a little revolution is a good thing.
Someone returning after self-imposed ostracization to the LKML, and offering up an as-yet-untried (by third party) -- much less accepted -- scheduler rates higher than a new release of Asterisk?
*Must* be a slow news day.
It is pretty tough to maintain when the lead developer takes extended vacations with the Rough Riders and the Bitch Babies at their favorite hideaway Oh, for information, it seems that Hans is one of the bitch babies
"Windows is like the faint smell of piss in a subway: it's there, and there's nothing you can do about it." - Charlie Br
YES! I'm so happy that -ck is back, and hope that Con's skills will be finally acknowledged more than before.
Welcome back Con! I wonder how long it is before Ingo "Kudos Con" Molnar rips of the new design? The kernel team has developed a very bad case of "not invented here." http://kerneltrap.org/node/8059
an ill wind that blows no good
However many computer users are discovering that what they really want on the desktop is a server/workstation OS. MS Win98/ME just doesn't cut it in comparison to NT, Win2k, XP "Pro", Server2003/8, Win7 64bit or linux. You get I/O hassles even burning data onto DVDs or loading large files so server style scheduling is nice on the desktop. Going back to even time slicing like MSDOS is really a bit like those guys that think they have stumbled on an automotive conspiracy and they tune their cars to run perfectly at idle instead of under load and end up using almost no fuel per hour. While things may perform a vast amount better at idle you get worse performance under load which is when you want it.
I get the impression that we have here an idea that all the kernel developers heard of in history of computers 101 and are dismissing it either because they thought it through while still students or dismissed it because it was abandoned some time ago. Con has come in from the outside without preconceptions and has either brought up something that might just work but has been dismissed out of hand or has hit the learning curve the other kernel developers went through while still students. Sometimes that gives you an advance and sometimes you just get people frustrated that they have to teach the new guy that isn't listening and won't even run the thing he's working on as his main system so can't really see the results. It's also annoying when someone brings a gun to a fist fight in discussions - none of us here could do his day job without years of the same experience and most of us couldn't even do it with that so suggesting that developers try doing his job for a day is a bit much.
I understand and appreciate when Con is doing. I also completely understand and appreciate what Ingo and the kernel guys are doing and saying and what they've done. The Linux scheduler has gotten much better over the years, especially for servers. The other thing is OS X and Windows don't have pluggable schedulers and yet they both seem to strike a nice balance with great desktop performance as well as legitimate servers. I also understand that they don't have a lot of tuning knobs and the rationale behind exposing those knobs to the users (seems everyone is an 'expert' at process scheduling...)
I can't help but think there is a piece or a couple pieces of information that the scheduler seems to not have which could solve this problem. It seems like there is some missing context or something where if they could easily identify "desktop processes" vs. "mostly idle daemons" or something we could come up with a better set of scheduler priorities or some different heuristics to improve the desktop feel without starving other processes. Maybe it's something as simple as promoting the priority of processes run by someone logged in at the console, I don't know.
I know you're pissed off, Con, and a nice "f-you" to the kernel team might feel good but don't you think solving the actual problem instead of just providing a band-aid would be a better use of everyone's energy?
A modern OS kernel, however, often has a lot more in common with microkernel designs even if it's all running in a single address space. Take a look, for example, at the OpenSolaris network stack. Every component runs in a separate thread and communicates with those above and below via message passing. It would be trivial to separate these out into different userspace processes, but there's no real advantage to doing so.
No advantages? Apart from the fact that suddenly, the OS can enforce the fact that you are using message passing, rather than shared memory? Or the fact that a kernel component can crash/be compromised without necessarily crashing/compromising the whole system. The one strong argument against microkernels has always been performance, and as the number of cores increases it is less and less relevant...
I wonder what BeOS had, that was so good. I mean, was it a scheduler thing? Or was it the pervasive multithreadedness that the OS almost forced upon the developers? Whatever it is, it worked like black magic: BeOS would always listen to the user input, no matter what the heck it was doing in the background, no matter what insane load was on the CPU - your mouseclicks were always reacted upon immediately, your drags were always reacted upon immediately, your typing, resizing, brushstrokes, midi-signals, whatever, always, under any circumstance, were immediately and smoothly followed by the correct response.
I know nothing of beos. However, my perception is that the only time that a modern linux (probably also windows) becomes unresponsive, short of relly abusive load, is when an application hoards too much memory and gets the computer swapping heavily. At that point the scheduler cannot really prevent unresponsiveness, because it has to swap stuff in and back out whenever it changes the running task...
It would be interesting to see people like Con and Alan branching off the mainline kernel. This way we could create more competition between the branches, which would eventually improve the kernel as a whole.
"I don't have 4096 CPUs, good job Con Kolivas!" - by boldie (1016145) on Sunday September 06, @06:39AM (#29330241)
Neither do I (know any END/HOME users that do? I don't... lol!). Yes, I wrote about this before, see my subject-line above: I felt that Linus Torvalds is pulling a "Edison to Tesla" maneuver here (ala "The War of the Currents"), & cannot STAND the fact that someone outwrote him or his other colleagues (Like Ingo Molnar (that KISS HIS A$$, & "Support his position" - which, iirc? WAS WHAT LINUS TORVALDS/PENGUIN #1 SAID to Con Kolivas, when this all came to light)), is all.
I.E.-> The fact that someone (Con Kolivas) did something BETTER (for end-users no less, the folks that matter vs. L.T. or Ingo Molnar his stooge/crony) & he cannot stand it & 'outed/dissed' this guy Con Kolivas.
I think this Ingo Molnar fellow may be some "best buddy" of Linus Torvalds @ this point & his work being outperformed by others like Mr. Kolivas here is going to show the world "what's-what" on this account, as far as process scheduler superiority (especially for END/HOME users of Linux - & that? THAT IS WHERE LINUX NEEDS TO GROW THE MOST (it's already a PROVEN server in many scenarios is why, time to work on the "end user" front now imo)).
Personally? Linus Torvalds MAY be a good programmer, but, inventing an OS which is based off of another previous OS is not such a "HUGE ACCOMPLISHMENT" imo @ least - it's only taking a base design & improving it (WITH THE HELP OF MANY OTHERS, no less)... he stood on the shoulders of giants, like the rest of us, in other words, & "deifying him" is not a good idea really. He's just a man, like the rest of us, & CAN MAKE MISTAKES as well.
E.G.-> If this works out as BETTER for end users? I think L.T. or his crony/pal Ingo Molnar made a HUGE mistake in hassling this guy Con Kolivas, personally... not a good move.
(Especially not if Con Kolivas' work here shows it is superior to CFS etc., for end users use).
APK
P.S.=> IF this scheduler works for people, ESPECIALLY "end user" type folks (home users etc. et al)?? Let it OUT to the masses... let them try it out, & see their feedback on it (as it is the BEST WAY to get improvements out of it, by field testing it on MANY subjects & finding bugs or specific use-cases where it does NOT work out well for them, as "criticism is worth 1,000 praises" (&, it's a devs BEST FRIEND, speaking as one here myself, as both dev & end user))... apk
Why must I always be Number Two? Amen, seconded.....
Leave it to a Slashdot reader to hyper-correct a kernel dev on grammar. "i.e." comes from Latin "id est", meaning "that is". Let's re-read the quote with that subsitition made.
Seems fine to me.
sounds like teh name of a super-villain. cool!!
I've compiled kernels before but I haven't patched them. It doesn't look too hard, except that it doesn't seem to say anywhere which kernel version to apply the patch to. Latest git? Latest stable? Mainline? Any ideas?
This is how the loudness war is killing music.
See, that is where you are wrong, as I have already forgiven you ;-)
In fact you are a major asshole if you think I care what you think of me. You might note my SlashID (and compare to yours) before trying to tell me how people are around here. There are many, many, many stupid, lying, or just plain moronic people here on Slashdot. Your naive if you think differently. When you deal with enough idiotic trolling assholes, it can make one somewhat of a jerk for the moment as well. I don't hide behind AC posts, but then I'm a grown man who owns up to my actions. Clearly, you haven't gotten there quite yet (based on your unsolicited and absurd "advice".)
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
http://lwn.net/Articles/351058/
Basically, he can not see any BFS performance improvements, on this box (Dual Quad-core).
wow thats running so much smoother. thanks con
I hope you guys are following this interesting conversation
http://thread.gmane.org/gmane.linux.kernel/886319
I Fucking love Con, He's just awesomly talented, passionate and dedicated (as well as having many other traits at levels that can only be described as "awesome")
we need people like this to counter balance other people like linus, shuttleworth, stallman et al (who are all also awesome in differen't ways but conflict with each others awesome somewhat)
please stick around CK you have many fans.
I use Fedora and I prefer to do updates with yumex. If I overlay the yumex window (while it is working), with another window, I could wait for as much as 30 seconds before yumex window updates (refreshes). This could be a yumex design problem, but I take it as a schedular problem. In my IBM mainframe days, the operating system was MVS (OS390 today). The concept was to create performance groups. Interactive performance group had high priority, while good old batch had low priority, all within the perfomance groups. As far as I can remember, performance groups were prioritized by the system administrator. It would mean a change to Linux, to classify an application into a performance group. Tar, cp, and a few other apps could be in the batch group, receiving lower priority status. Just my thoughts on scheduling. Schedule performance groups and schedule tasks within a performance group.
Leslie Satenstein Montreal Quebec Canada
The cited use of i.e. seems perfectly reasonable. When i.e. and e.g. are so often used incorrectly, nobody recognises the correct usage any longer!
Author, Shell Scripting : Expert Re