Should this patch make a difference on SMP systems? I heard it used the same semantics as SMP support, only on UP systems. If that's the case there shouldn't be a difference. Is the SMP kernel fully preemptible by default?
Yes, it will. The SMP kernel is no more preemptible than the UP one. What we mean by saying "the preempt-kernel patch leverages the existing SMP locks" is that we take advantage of the fact the linux kernel is already protected against concurrency and reentrancy where needed, and we make use of that.
In other words, an SMP system will benefit from a preemptible kernel in the same manner a UP system will... the kernel still runs to completition without the patch. That said, the effects will be a little less pronounced since you have a second CPU to run tasks and thus scheduling latency won't be as bad... at least in theory, heh.
The patches are at kernel.org but please use a mirror. The 2.4.16-pre1 patch is fine as previous mentioned, but I'll put up a rediff against 2.4.16 soon.
Secondly, journalling does not mean that there's no fsck; it just means that it's an order of magnitute or four faster. This is because during the filesystem consistency check, we know *exactly* where to look for problems(thanks to the journal). This doesn't result in better data protection, but it does result in better availability(and hence uptime).
This isn't true. You do _not_ need to run fsck on a journal partition. A Journal does not simply say "hey the problem is here, just fix these inodes!". A Journal contains exactly what should of happened and what did happen so the inconsistance state can be repaired by "replaying" the not-yet-executed portion of the journal.
For all intents, you don't need fsck at all. For example, RedHat 7.2 will prompt and ask if you would like to fsck a dirty partition (after the journal replay). Most people say no. If you say yes, most likely nothing will occur since everything is now consistent. It is for the paranoid.
I thought that what (certain) kernel hackers really objected to is preemption while locks are held. The complications (eg priority inversion) they talked about seem only to arise in that case.
There are a few reasons other hackers complain, although I didn't know this was one of them. Since MontaVista's original preemptive kernel work, I believe, we have never preempted inside of locks. Note that you can, but then you reach the issues with deadlocks and thus the need for priority-inversion that you spoke of.
So, first, does "fully-preemtive" traditionally mean with or without locks? Are Solaris, NT, and RTOS preemtible when locks are held?
I would say it means sans locks. None of the mentioned OS's are preemptive while holding a lock. You always have to respect the lock. Now, you can preempt during the lock and go do other things. If you do this, you are assuming the lock is going to be held long (or else it is favorable to just spin for a cycle or two). In this situation you want to use semaphores, which we _do_ preempt during.
When a process hits a semaphore that is in use, it goes to sleep and something else continues. The process awakes when the resource is available. Now we reach the problem you wrote of above: priority inversion. What if task A holds resource Y and sleeps waiting for resource X and task B holds resource X and sleeps waiting for resource Y? You deadlock.
Thus we need to use a type of semaphore called a priority-inheriting mutex, which inverts the priority of the task holding a resource so it will always complete and release the lock. I know Solaris has these. However, I would consider any kernel that can preempt itself in general a preemptible kernel.
Second, observed results aside, what reason do you have to believe that preempting the lock-less parts of the kernel is "good enough". All else equal, one would expect the latency distribution to be similar with and without locks, so you would expect plenty of "worst cases" to occur with locks. Of course, there is already a pressure to reduce the time that critical locks are held, but I wouldn't be surprised to see non-contended locks (especially outside the kernel core) held for long times. So is there a good reason that the important "worst cases" are happen without locks?
First, before I cast results aside, let me mention that observations show we are already lowering latency a great amount. But, you are right, periods in which locks are held are a problem. This is why I mentioned in the interview the use of things like Andrew Morton's low-latency patch, the preempt-stats patch (for finding the locks), etc.
Some of the problems still occur while locks are held, but thankfully the point of a spinlock is that they are held for a VERY short time. A solution to this may be to replace the spinlocks held for a long time with a priority-inhereting mutex.
but why does anyone need better latency? afaikt, the latency here is strictly for people who want to do RT audio effects. this has nothing to do with audio playback , which has no latency sensitivity (because of buffering). this also has nothing to do with "feel", since humans are terribly slow, and cannot possibly feel the difference between 5 and 10ms.
You ever have an mp3 skip? Audio become out of sync in a game? That is caused by scheduling latencies becoming greater than the duration of the audio buffer. Ie, audio playback does not just need x units of CPU but it also needs it every y units of time. The preempt-kernel patch helps alleviate this.
I hope that Linus will look at whether these patches hurt the normal case. "normal" means things like kernel compilation, not just an arbitrary latency measure and dbench (one of the least realistic benchmarks possible!)
Not only does preempt not hurt a kernel compile, but it helps it. I and many users have benchmarks. One of my requests from users is to get a lot of benchmarks and "feelings" so I can substantiate the patch. I am _not_ an audio guy. I use my Linux machine to code, go on the net, etc. just like 90% of the people here. Preemption helps me. I don't want to hurt the common case either.
Even so, it is a configure item. Merging it into the kernel does not equate to you having to use it. But I bet you would want to!
there are good reasons to be skeptical of all-out premptiveness: it will unavoidably lower throughput in easy-to-define cases. any intro OS text will talk about optimal scheduling, where 'optimal' requires a definition of throughput or some other metric.
The cases in which we lower throughput are cases in which file I/O is favored since it runs until completition. In this case, you can extend that argument to be that I/O-intense tasks should just be cooperatively scheduled. An I/O task won't be preempted unless its timeslice has run out (ie, it should be preempted, and it would be if it were in userspace). If the I/O is so critical, run it at a higher priority. Hell, maybe we should look into a higher timeslice.
Note that a lot of this is a non-issue, since we don't affect throughput (or actually improve it!) In the cases throughput is decreased, it is just a couple of percent, which could be cost-benefited to the increase in response some other application gets.
we need to set a target (5ms would be fine imo) and meet it. going beyond such a goal will hurt the normal case.
This is very very true, and an insightful point. One of the problems with this whole latency quest is that eventually we are going to reach some point and have to decide if enough-is-enough. We can always keep doing more and eventually the work _is_ going to be detrimental to the common-case. I agree we need to set a threshold and celebrate when we reach it. The super-special situations needing much lower latency can apply super-special solutions.
I think this is a good short-term solution for the latency problems but I personally wouldn't include it in the main kernel releases. I believe that it *might* be a good idea to fork the kernel releases (temorarily) in two groups: One for servers and one for workstations until the problems have been solved.
I tend to look at this more of a long-term solution, and I think people who see it has a short-term solution or hack are missing the point. First, this is a feature. We aren't kludging kernel code so that we can lower latency by stopping it when needed. We are effectively using the SMP code to multitask better within the kernel.
Second, forking the kernel over this is a terrible idea. Since it is a config setting, this is a non-issue anyhow, but I really don't want to see this thing forked off. In fact, I think the ideal situation is where we can get a preemptible kernel that benefits throughput so that server processes benefit from it as well.
I think that (for now) using this patch on workstations is a pretty good idea
Agreed:)
And I think there should be a better solution for the problem witch should THEN be something along the lines of kernel 3.0
There isn't a better solution that is not a hack. There is a reason Solaris, NT, and all RTOS are preemptible inside the kernel: it is the only way to achieve real-time response. You just _have_ to be able to respond to events when needed.
The "better" solutions in this case are "simpler" -- if we can hack some conditional schedules into places, perhaps simplify some algorithms, etc. then we can perhaps reduce latency without preemption. This is what Andrew Morton's low-latency patches do. But we need more. The point is not that preempt-kernel is a hack, but that it is a whole new high-tech feature, and some people want to find a simpler solution.
Personally, I don't think a simpler solution exists, and I believe the preemptive kernel satisfies other problems (and it also a neat feature:>). Thus I work on it.
Can someone fill me in... Hasn't Microsoft been claiming windows has been preemptive since win95??? Is this some other form of 'preemptiveness'?
You are thinking of forms of multitasking. One form is preemptive, in which tasks are given a specific period in which to run (timeslice) and then forcibly preempted by the next runnable task when that quanta ends. Win95, NT, all Unices, and anything decent fit in here
The other form is cooperative, in which tasks run until they yield execution. This is how Win 3.1 is. In 3.1, tasks ran until they finished processing their current Windows Message or called yield().
This article is about a preemptive kernel, where actually the same ideas apply. Inside the kernel, things are currently cooperative in the sense the kernel code runs until it completes or yields control. This patch makes it preemptive -- it will be preempted when something more important needs to happen.
Win95 does not have a preemptive kernel (it isn't even reentrant). NT might. Solaris does. Linux does with this patch.
It could of, I just seem to remember a 1.0 kernel.
Can anyone give a nice layman's description of what he is talking about here?
Basically I am explaining the modifications to the kernel we made in order to make it preemptible. To try to put it more for the layman, besides just allowing the kernel to preempt itself as needed, we had to prevent some certain situations from being preempted. This is the same situation with SMP. We use SMP's locks to disallow preemption, for concerns of concurrency and reentrancy. We can't preempt during interrupt or BH handling because those things are not designed for concurrency, either.
To sum it up, we have to prevent preemption in some situations. Those situations are: while locks are held, while handling interrupts and bottom halves, and while inside the scheduler itself.
I originally felt I should stay out of any discussion here, but I want to answer some of these questions and clear some of this stuff up. To be honest, it is a little embarrassing having everyone read and comment on the interview.:)
Bottom Half handlers generaly are fast track implementations to quickly deal with the interrupts. To avoid concurrency collisions of reasources used within the bottom half handlers, interrupts (for that particular handler) must be disabled during the handler's execution.
Interrupts, even just the in question, are not disabled during a bottom half, at least in general. The reason we can't preempt bottom halves is that they are guaranteed to be serialized w.r.t CPUs (ie a given BH runs on only one CPU at a time). Because of this, the BHs are designed without a regard reentrancy. So we can't preempt them.
All in all, this is basic non-preemptive stuff. What I don't understand is that this strategy that he is defining is a textbook NON-premtive approach to kernel design. I'm not too sure where he gets off claiming that the kernel is fully-preemptive here.
Hardly. Would you say an SMP system is not SMP if it is non-concurrent inside critical sections? No, you wouldn't, and this is the same situation we have here with preemption.
We can't preempt inside critical regions. We have concurrency and reentrancy concerns, just like SMP does. We also can't preempt inside interrupt handlers or bh's because they aren't designed to be preempted (nor would you want to interrupt the top half of an interrupt, anyhow).
The current kernel is not preemptive _anywhere_. The only way, in fact, kernel code ever yields execution is if it explicitly does so or returns. Since with the preempt-kernel patch we can now preempt in 90% of the kernel, I think its safe to say we have a preemptible kernel now.
Re:Linus' opinion about preemptive kernel
on
Torvalds Tells All
·
· Score: 2
I like Linus' standpoint about the preemptive kernel. I agree with him that one should eliminate cause of latencies instead of addressing symptoms only.
Disclaimer: I am the maintainer of the preempt-kernel patch
I wouldn't say the preempt-kernel patch eliminates symptoms without addressing problems, nor would I say that is even what Linus is saying. The problem is that the kernel is not preemptible, so kernel code runs to completion, and thus long-running kernel code causes long system latencies since nothing can run in the interim.
Addressing the problem is exactly what the patch does. It causes the kernel to be preemptible and -- voila -- the problem goes away. Addressing the symptoms is the current approach: sprinkling conditional scheduling statements around the kernel in fairness to others.
Even so, I don't think Linus thinks we are addressing the symptoms and ignoring the cause. He is just interested in seeing if replacing kernel algorithms with better ones, or using conditional scheduling points, can fix the problem for most without the need of a preemptible kernel (ie a simple solution). Otherwise, I'd bet he is quite favorable to our design
U.S. residents are not currently eligible to
take part (only French residents are permitted).
that is odd, although i am not doubting it, but i
got an email from Mandrakesoft asking me to fill
out an html form for my personal information as i
was a "Community Member/Contributor" (i am a
credited linux kernel contributor).
i assumed this was for IPO (or other stock) access,
but the CREDITS file clearly lists me as a US
resident -- so, if the stock is French-only, why
did they bother? i questioned originally why i
received the notice because i assumed the stock
would be in a french market and thus off-limits
to me.
perhaps there are some other benefits? did anyone
else receive the email? i was not planning taking
advantage of the offer, but it was interesting
nonetheless.
I don't think it gets more fair than this. KIllustrator is grossly in violation of Adobe's trademark. Slashdotters all agreed on this on the previous article.
KIllustrator is a clone product of Illustrator, its the same type of product (software), doing the same thing (raster drawing). Of course the name is meant to capitalize on Adobe's product!
Now, Adobe has some courses of action. The one they are taking is by far the mildest and most polite. They need to protect their trademark. They need KIllustrator to get a new name. They can take this court, they can aim for monetary damages. They aren't.
I think we should applaud Adobe on this. They have to protect their trademark, after all. If nothing else, they are helping a KApplication get a better name.
almost sadly, this is a real config option. its 'make adventure' in cml2 1.6.0. it is kind of neat to play with, actually.
esr has some ideas planned for future versions, see your cml2-1.6.2/TODO.
when esr first posted to the kernel list the announcement for this new feature, I posted a reply to him on lkml along the lines of "_WAY_ too much time on your hands, dont you have some linux advocacy to be doing?:)"... he replied he wrote this on a plane flight.
for those of you ready to take the jump to cml2, it is available at http://tuxedo.org/~esr/cml2/. You will need python2 (RedHat PowerTools has an RPM, Debian has a DEB in stable) to use it. you will also need tkinter2 if you want to use `make xconfig'. installation is simple.
i was one of the people initially against cml2 on lkml (i didnt want to install python2, i didnt think cml1 was broken -- and hence we were created problems) but now i am pretty impressed by cml2. its still not a must, imo, but i like it. it should be integrated in 2.5.2.
> take SCSI
It won't budge.
> take SCSI
Seriously, it is not going to move an inch.
> take SCSI
You try, but it won't move.
> take SCSI
It moves a little!
> take SCSI
SCSI: taken.
I have a CD collection, and this is news to me. In fact, I doubt that anyone I know was aware that their CDs may be lunch for a putrid fungus.
How do you figure anyone with CDs will know about this?
First, I meant scarey not scarely, maybe you thought I meant scarcely?
At any rate, I meant that is is scarey -- ie, bad -- news for those of us with CD collections (although, as I say later, not a big deal). However, it is not surprising
that such a fungus exists because metal-corroding
fungi are well-known
the geotrichum genus is a filamentous fungi. It is typically characterized by chains of slimy spores, often times with strong odors (they can stink).
It is not odd for this genus to corrode metal; gerotrichum is probably the most common type of fungi found in metal.
Geotrichum candidum was once considered to be a contaminant on the surface of cheeses (it would naturally grow there). Now, however, because it grows so quick it is now used for the inoculation of surface mould to encourage ripening.
While this is scarely news for anyone with a CD collection, it certainly is not surprising. Best of all, however, is that I don't think we have to worry about it too much -- if at all, especially in the US.
First, MS is using FreeBSD on some advertising
and DNS servers -- hardly _depending_ on it for
the main Hotmail stuff.
Second, I am sure they really do plan on moving
to Windows
Also, while MS does often attack all of opensource,
it is really the GPL they target. FreeBSD is not
GPLed, so its not their number one enemy.
So, this is not that big of a deal. I am sure
Microsoft will change these systems to Windows at some point. What will be interesting is if we observe lower
uptimes once everything is Windows!
From the article (in reference to SMP OS support):
There are exceptions to this rule, such as the Mac, where Photoshop has a patch which has multithreaded support for G4's even though no Mac OS below Mac OS X supports multiprocessing. Generally speaking, the computer needs OS support as well as application support to take advantage of multiprocessing.
Maybe I am not reading this correctly,
but is it saying that a patch to an APPLICATION
gave the OS SMP support? I don't see how this could work... SMP support needs to be from the OS. Not only would the OS need to be designed from the ground up to support multiple CPUs (ie, the scheduler, interrrupts, etc) but it would need to be thread-safe itself (proper locking, etc.).
So, either tell me "yah, of course, the article is wrong" or can anyone explain how a patch to PhotoShop could give it SMP support on a non-SMP-aware OS?
that is not how copyright licenses work. the GPL is not a contract, it is a license to use a copyrighted work.
if it were a contract, you would have to agree to it. since its not, you dont.
instead, before the license, you have no rights. you dont own the material. the GPL grants you permission to use the copyrighted material. so your use implicitly shows agreement of the GPL (or violation of copyright).
My understanding is that gcc does not do as good a job optimizing as some other compilers.
actually, gcc is one of the most proficient optimizing compilers around. i also dont see how "not scientific" translates to "any dumb mistake is permissible" -- if a floating point operation (which was certainly optimized away by ANY compiler) changed the results, how is it "all the same" ?
All in all, this is the best comparison between FreeBSD and Linux I've seen yet.
there are a few reasons... one of the main reasons is that connection reporting falters with syn cookies enabled. since there are no TSBs until the connection is made (in other words, there is nothing stored about a connection during the connection process) the only way to log the events is via filtering with something like tcpdump. during a heavy syn flood, the sniffer may drop packets. not a big deal, in my opinion. the Configure.help for CONFIG_SYN_COOKIES mentions this.
originally it was a concern that syn cookies would impose a noticable performance hit, but currently linux uses a hash to generate the initial syn anyhow, so isnt a big deal.
there are some other technical issues, although linux's implementation is supposed to be void of most (all?) of them. see my previously posted link for the e-mail discussion of the implementation, where the issues are discussed. the "new" proposal actually has some issues our's does not. Alan Cox posted to linux-kernel earlier mentioning some flaw. a year earlier, a magnitude better -- linux:>
currently, the config option defaults to off and even when enabled, the sysctl defauls to off (you need to "echo 1 >/proc/sys/net/ipv4/tcp_syncookies" to enable them).
i think i agree, because it is an extra feature that should default to off (or just be a standard nontoggleable part of the OS).. but if you do enable it, the sysctl (a Good Thing) should default to on, imo.
we put this into the linux kernel in early 2.1. the define is CONFIG_SYN_COOKIES and it also needs a sysctl/proc option to be enabled (net.ipv4.tcp_syncookies)
basically, you make your syn a function of the session's data (local and remote port, hostname, and syn) and some secret. then when the ACK returns with your SYN (and all the original data), you can inverse the function to see if it matches.
if you make this function a cryptographically strong one-way hash and use a good secret, the cookie is fairly undeterministic
in linux, we do this exactly:
our_initial_syn = one_way_hash(src.port, dst.port, src.host, dst.host, secret) + src.initial.syn;
by adding on, and not using as an argument, the initial syn, we can keep our syns properly spaced. the secret is a counter that is incremented every minute. this acts as a method for checking for old acks, too.
the TSB is not created until the final ack is received. the MSS is encoded in 3-bits in our syn (so our syn is 2^28 bits secure). no TSBs until ACK means no queue size... it works.
see http://cr.yp.to/syncookies.html for the initial discussion of implementation.
Wrong. Windows NT is a microkernel with a full multi-threaded server setup, similar (at least in theory) to HURD. The POSIX and OS/2 subsystems (proc systems that run on the central kernel) where never expanded because no one cared. You can buy a full POSIX2 compliant layer for NT (i forgot the name..) but it is not too popular, proving my point. NT has been ported to (and released) on x86, Alpha, MIPS, and PPC, but canceled due to complete lack of public support (although Alpha was midly popular, that was more political). There are Win64 versions for Alpha (canceled) and Merced. And who knows what else MS researches, but w2k will be x86 only. NT is a fairly robust and powerful system. Robert (sagei)
Isn't that what IBM's Power4 chip does? 4 cores on one silicon with certain shared resources....
Power4 is different, it is a multi-core CPU -- this means there are actually multiple (in Power4's case, two I believe) CPU cores on each die.
SMT just duplicates certain parts (say, the registers) and they share the resources of the core.
Should this patch make a difference on SMP systems? I heard it used the same semantics as SMP support, only on UP systems. If that's the case there shouldn't be a difference. Is the SMP kernel fully preemptible by default?
Yes, it will. The SMP kernel is no more preemptible than the UP one. What we mean by saying "the preempt-kernel patch leverages the existing SMP locks" is that we take advantage of the fact the linux kernel is already protected against concurrency and reentrancy where needed, and we make use of that.
In other words, an SMP system will benefit from a preemptible kernel in the same manner a UP system will ... the kernel still runs to completition without the patch. That said, the effects will be a little less pronounced since you have a second CPU to run tasks and thus scheduling latency won't be as bad ... at least in theory, heh.
The patches are at kernel.org but please use a mirror. The 2.4.16-pre1 patch is fine as previous mentioned, but I'll put up a rediff against 2.4.16 soon.
Secondly, journalling does not mean that there's no fsck; it just means that it's an order of magnitute or four faster. This is because during the filesystem consistency check, we know *exactly* where to look for problems(thanks to the journal). This doesn't result in better data protection, but it does result in better availability(and hence uptime).
This isn't true. You do _not_ need to run fsck on a journal partition. A Journal does not simply say "hey the problem is here, just fix these inodes!". A Journal contains exactly what should of happened and what did happen so the inconsistance state can be repaired by "replaying" the not-yet-executed portion of the journal.
For all intents, you don't need fsck at all. For example, RedHat 7.2 will prompt and ask if you would like to fsck a dirty partition (after the journal replay). Most people say no. If you say yes, most likely nothing will occur since everything is now consistent. It is for the paranoid.
Ext3 is pretty nice, btw.
I thought that what (certain) kernel hackers really objected to is preemption while locks are held. The complications (eg priority inversion) they talked about seem only to arise in that case.
There are a few reasons other hackers complain, although I didn't know this was one of them. Since MontaVista's original preemptive kernel work, I believe, we have never preempted inside of locks. Note that you can, but then you reach the issues with deadlocks and thus the need for priority-inversion that you spoke of.
So, first, does "fully-preemtive" traditionally mean with or without locks? Are Solaris, NT, and RTOS preemtible when locks are held?
I would say it means sans locks. None of the mentioned OS's are preemptive while holding a lock. You always have to respect the lock. Now, you can preempt during the lock and go do other things. If you do this, you are assuming the lock is going to be held long (or else it is favorable to just spin for a cycle or two). In this situation you want to use semaphores, which we _do_ preempt during.
When a process hits a semaphore that is in use, it goes to sleep and something else continues. The process awakes when the resource is available. Now we reach the problem you wrote of above: priority inversion. What if task A holds resource Y and sleeps waiting for resource X and task B holds resource X and sleeps waiting for resource Y? You deadlock.
Thus we need to use a type of semaphore called a priority-inheriting mutex, which inverts the priority of the task holding a resource so it will always complete and release the lock. I know Solaris has these. However, I would consider any kernel that can preempt itself in general a preemptible kernel.
Second, observed results aside, what reason do you have to believe that preempting the lock-less parts of the kernel is "good enough". All else equal, one would expect the latency distribution to be similar with and without locks, so you would expect plenty of "worst cases" to occur with locks. Of course, there is already a pressure to reduce the time that critical locks are held, but I wouldn't be surprised to see non-contended locks (especially outside the kernel core) held for long times. So is there a good reason that the important "worst cases" are happen without locks?
First, before I cast results aside, let me mention that observations show we are already lowering latency a great amount. But, you are right, periods in which locks are held are a problem. This is why I mentioned in the interview the use of things like Andrew Morton's low-latency patch, the preempt-stats patch (for finding the locks), etc.
Some of the problems still occur while locks are held, but thankfully the point of a spinlock is that they are held for a VERY short time. A solution to this may be to replace the spinlocks held for a long time with a priority-inhereting mutex.
Disclaimer: It is my patch
but why does anyone need better latency? afaikt, the latency here is strictly for people who want to do RT audio effects. this has nothing to do with audio playback , which has no latency sensitivity (because of buffering). this also has nothing to do with "feel", since humans are terribly slow, and cannot possibly feel the difference between 5 and 10ms.
You ever have an mp3 skip? Audio become out of sync in a game? That is caused by scheduling latencies becoming greater than the duration of the audio buffer. Ie, audio playback does not just need x units of CPU but it also needs it every y units of time. The preempt-kernel patch helps alleviate this.
I hope that Linus will look at whether these patches hurt the normal case. "normal" means things like kernel compilation, not just an arbitrary latency measure and dbench (one of the least realistic benchmarks possible!)
Not only does preempt not hurt a kernel compile, but it helps it. I and many users have benchmarks. One of my requests from users is to get a lot of benchmarks and "feelings" so I can substantiate the patch. I am _not_ an audio guy. I use my Linux machine to code, go on the net, etc. just like 90% of the people here. Preemption helps me. I don't want to hurt the common case either.
Even so, it is a configure item. Merging it into the kernel does not equate to you having to use it. But I bet you would want to!
there are good reasons to be skeptical of all-out premptiveness: it will unavoidably lower throughput in easy-to-define cases. any intro OS text will talk about optimal scheduling, where 'optimal' requires a definition of throughput or some other metric.
The cases in which we lower throughput are cases in which file I/O is favored since it runs until completition. In this case, you can extend that argument to be that I/O-intense tasks should just be cooperatively scheduled. An I/O task won't be preempted unless its timeslice has run out (ie, it should be preempted, and it would be if it were in userspace). If the I/O is so critical, run it at a higher priority. Hell, maybe we should look into a higher timeslice.
Note that a lot of this is a non-issue, since we don't affect throughput (or actually improve it!) In the cases throughput is decreased, it is just a couple of percent, which could be cost-benefited to the increase in response some other application gets.
we need to set a target (5ms would be fine imo) and meet it. going beyond such a goal will hurt the normal case.
This is very very true, and an insightful point. One of the problems with this whole latency quest is that eventually we are going to reach some point and have to decide if enough-is-enough. We can always keep doing more and eventually the work _is_ going to be detrimental to the common-case. I agree we need to set a threshold and celebrate when we reach it. The super-special situations needing much lower latency can apply super-special solutions.
Disclaimer: It's my patch
:)
I think this is a good short-term solution for the latency problems but I personally wouldn't include it in the main kernel releases. I believe that it *might* be a good idea to fork the kernel releases (temorarily) in two groups: One for servers and one for workstations until the problems have been solved.
I tend to look at this more of a long-term solution, and I think people who see it has a short-term solution or hack are missing the point. First, this is a feature. We aren't kludging kernel code so that we can lower latency by stopping it when needed. We are effectively using the SMP code to multitask better within the kernel.
Second, forking the kernel over this is a terrible idea. Since it is a config setting, this is a non-issue anyhow, but I really don't want to see this thing forked off. In fact, I think the ideal situation is where we can get a preemptible kernel that benefits throughput so that server processes benefit from it as well.
I think that (for now) using this patch on workstations is a pretty good idea
Agreed
And I think there should be a better solution for the problem witch should THEN be something along the lines of kernel 3.0
There isn't a better solution that is not a hack. There is a reason Solaris, NT, and all RTOS are preemptible inside the kernel: it is the only way to achieve real-time response. You just _have_ to be able to respond to events when needed.
The "better" solutions in this case are "simpler" -- if we can hack some conditional schedules into places, perhaps simplify some algorithms, etc. then we can perhaps reduce latency without preemption. This is what Andrew Morton's low-latency patches do. But we need more. The point is not that preempt-kernel is a hack, but that it is a whole new high-tech feature, and some people want to find a simpler solution.
Personally, I don't think a simpler solution exists, and I believe the preemptive kernel satisfies other problems (and it also a neat feature:>). Thus I work on it.
Can someone fill me in... Hasn't Microsoft been claiming windows has been preemptive since win95??? Is this some other form of 'preemptiveness'?
You are thinking of forms of multitasking. One form is preemptive, in which tasks are given a specific period in which to run (timeslice) and then forcibly preempted by the next runnable task when that quanta ends. Win95, NT, all Unices, and anything decent fit in here
The other form is cooperative, in which tasks run until they yield execution. This is how Win 3.1 is. In 3.1, tasks ran until they finished processing their current Windows Message or called yield().
This article is about a preemptive kernel, where actually the same ideas apply. Inside the kernel, things are currently cooperative in the sense the kernel code runs until it completes or yields control. This patch makes it preemptive -- it will be preempted when something more important needs to happen.
Win95 does not have a preemptive kernel (it isn't even reentrant). NT might. Solaris does. Linux does with this patch.
I thought the Slack 2.0 release had a 1.1 kernel.
It could of, I just seem to remember a 1.0 kernel.
Can anyone give a nice layman's description of what he is talking about here?
Basically I am explaining the modifications to the kernel we made in order to make it preemptible. To try to put it more for the layman, besides just allowing the kernel to preempt itself as needed, we had to prevent some certain situations from being preempted. This is the same situation with SMP. We use SMP's locks to disallow preemption, for concerns of concurrency and reentrancy. We can't preempt during interrupt or BH handling because those things are not designed for concurrency, either.
To sum it up, we have to prevent preemption in some situations. Those situations are: while locks are held, while handling interrupts and bottom halves, and while inside the scheduler itself.
I originally felt I should stay out of any discussion here, but I want to answer some of these questions and clear some of this stuff up. To be honest, it is a little embarrassing having everyone read and comment on the interview. :)
Bottom Half handlers generaly are fast track implementations to quickly deal with the interrupts. To avoid concurrency collisions of reasources used within the bottom half handlers, interrupts (for that particular handler) must be disabled during the handler's execution.
Interrupts, even just the in question, are not disabled during a bottom half, at least in general. The reason we can't preempt bottom halves is that they are guaranteed to be serialized w.r.t CPUs (ie a given BH runs on only one CPU at a time). Because of this, the BHs are designed without a regard reentrancy. So we can't preempt them.
All in all, this is basic non-preemptive stuff. What I don't understand is that this strategy that he is defining is a textbook NON-premtive approach to kernel design. I'm not too sure where he gets off claiming that the kernel is fully-preemptive here.
Hardly. Would you say an SMP system is not SMP if it is non-concurrent inside critical sections? No, you wouldn't, and this is the same situation we have here with preemption. We can't preempt inside critical regions. We have concurrency and reentrancy concerns, just like SMP does. We also can't preempt inside interrupt handlers or bh's because they aren't designed to be preempted (nor would you want to interrupt the top half of an interrupt, anyhow).
The current kernel is not preemptive _anywhere_. The only way, in fact, kernel code ever yields execution is if it explicitly does so or returns. Since with the preempt-kernel patch we can now preempt in 90% of the kernel, I think its safe to say we have a preemptible kernel now.
I like Linus' standpoint about the preemptive kernel. I agree with him that one should eliminate cause of latencies instead of addressing symptoms only.
Disclaimer: I am the maintainer of the preempt-kernel patch
I wouldn't say the preempt-kernel patch eliminates symptoms without addressing problems, nor would I say that is even what Linus is saying. The problem is that the kernel is not preemptible, so kernel code runs to completion, and thus long-running kernel code causes long system latencies since nothing can run in the interim.
Addressing the problem is exactly what the patch does. It causes the kernel to be preemptible and -- voila -- the problem goes away. Addressing the symptoms is the current approach: sprinkling conditional scheduling statements around the kernel in fairness to others.
Even so, I don't think Linus thinks we are addressing the symptoms and ignoring the cause. He is just interested in seeing if replacing kernel algorithms with better ones, or using conditional scheduling points, can fix the problem for most without the need of a preemptible kernel (ie a simple solution). Otherwise, I'd bet he is quite favorable to our design
-- Robert
U.S. residents are not currently eligible to take part (only French residents are permitted).
that is odd, although i am not doubting it, but i got an email from Mandrakesoft asking me to fill out an html form for my personal information as i was a "Community Member/Contributor" (i am a credited linux kernel contributor).
i assumed this was for IPO (or other stock) access, but the CREDITS file clearly lists me as a US resident -- so, if the stock is French-only, why did they bother? i questioned originally why i received the notice because i assumed the stock would be in a french market and thus off-limits to me.
perhaps there are some other benefits? did anyone else receive the email? i was not planning taking advantage of the offer, but it was interesting nonetheless.
-- Robert
I don't think it gets more fair than this. KIllustrator is grossly in violation of Adobe's trademark. Slashdotters all agreed on this on the previous article.
KIllustrator is a clone product of Illustrator, its the same type of product (software), doing the same thing (raster drawing). Of course the name is meant to capitalize on Adobe's product!
Now, Adobe has some courses of action. The one they are taking is by far the mildest and most polite. They need to protect their trademark. They need KIllustrator to get a new name. They can take this court, they can aim for monetary damages. They aren't.
I think we should applaud Adobe on this. They have to protect their trademark, after all. If nothing else, they are helping a KApplication get a better name.
Robert
almost sadly, this is a real config option. its 'make adventure' in cml2 1.6.0. it is kind of neat to play with, actually.
:)" ... he replied he wrote this on a plane flight.
esr has some ideas planned for future versions, see your cml2-1.6.2/TODO.
when esr first posted to the kernel list the announcement for this new feature, I posted a reply to him on lkml along the lines of "_WAY_ too much time on your hands, dont you have some linux advocacy to be doing?
for those of you ready to take the jump to cml2, it is available at http://tuxedo.org/~esr/cml2/. You will need python2 (RedHat PowerTools has an RPM, Debian has a DEB in stable) to use it. you will also need tkinter2 if you want to use `make xconfig'. installation is simple.
i was one of the people initially against cml2 on lkml (i didnt want to install python2, i didnt think cml1 was broken -- and hence we were created problems) but now i am pretty impressed by cml2. its still not a must, imo, but i like it. it should be integrated in 2.5.2.
give it a try.
-- Robert
> take SCSI
It won't budge.
> take SCSI
Seriously, it is not going to move an inch.
> take SCSI
You try, but it won't move.
> take SCSI
It moves a little!
> take SCSI
SCSI: taken.
-- Robert
I have a CD collection, and this is news to me. In fact, I doubt that anyone I know was aware that their CDs may be lunch for a putrid fungus.
How do you figure anyone with CDs will know about this?
First, I meant scarey not scarely, maybe you thought I meant scarcely?
At any rate, I meant that is is scarey -- ie, bad -- news for those of us with CD collections (although, as I say later, not a big deal). However, it is not surprising that such a fungus exists because metal-corroding fungi are well-known
-- Robertthe geotrichum genus is a filamentous fungi. It is typically characterized by chains of slimy spores, often times with strong odors (they can stink).
It is not odd for this genus to corrode metal; gerotrichum is probably the most common type of fungi found in metal.
Geotrichum candidum was once considered to be a contaminant on the surface of cheeses (it would naturally grow there). Now, however, because it grows so quick it is now used for the inoculation of surface mould to encourage ripening.
While this is scarely news for anyone with a CD collection, it certainly is not surprising. Best of all, however, is that I don't think we have to worry about it too much -- if at all, especially in the US.
-- RobertFirst, MS is using FreeBSD on some advertising and DNS servers -- hardly _depending_ on it for the main Hotmail stuff.
Second, I am sure they really do plan on moving to Windows
Also, while MS does often attack all of opensource, it is really the GPL they target. FreeBSD is not GPLed, so its not their number one enemy.
So, this is not that big of a deal. I am sure Microsoft will change these systems to Windows at some point. What will be interesting is if we observe lower uptimes once everything is Windows!
-- Robert
From the article (in reference to SMP OS support):
There are exceptions to this rule, such as the Mac, where Photoshop has a patch which has multithreaded support for G4's even though no Mac OS below Mac OS X supports multiprocessing. Generally speaking, the computer needs OS support as well as application support to take advantage of multiprocessing.
Maybe I am not reading this correctly, but is it saying that a patch to an APPLICATION gave the OS SMP support? I don't see how this could work ... SMP support needs to be from the OS. Not only would the OS need to be designed from the ground up to support multiple CPUs (ie, the scheduler, interrrupts, etc) but it would need to be thread-safe itself (proper locking, etc.).
So, either tell me "yah, of course, the article is wrong" or can anyone explain how a patch to PhotoShop could give it SMP support on a non-SMP-aware OS?
Thanks,
Robertthat is not how copyright licenses work. the GPL is not a contract, it is a license to use a copyrighted work.
if it were a contract, you would have to agree to it. since its not, you dont.
instead, before the license, you have no rights. you dont own the material. the GPL grants you permission to use the copyrighted material. so your use implicitly shows agreement of the GPL (or violation of copyright).
robert
My understanding is that gcc does not do as good a job optimizing as some other compilers.
actually, gcc is one of the most proficient optimizing compilers around. i also dont see how "not scientific" translates to "any dumb mistake is permissible" -- if a floating point operation (which was certainly optimized away by ANY compiler) changed the results, how is it "all the same" ?
All in all, this is the best comparison between FreeBSD and Linux I've seen yet.
because it gives the results you want?
there are a few reasons ... one of the main reasons is that connection reporting falters with syn cookies enabled. since there are no TSBs until the connection is made (in other words, there is nothing stored about a connection during the connection process) the only way to log the events is via filtering with something like tcpdump. during a heavy syn flood, the sniffer may drop packets. not a big deal, in my opinion. the Configure.help for CONFIG_SYN_COOKIES mentions this.
:>
/proc/sys/net/ipv4/tcp_syncookies" to enable them).
.. but if you do enable it, the sysctl (a Good Thing) should default to on, imo.
originally it was a concern that syn cookies would impose a noticable performance hit, but currently linux uses a hash to generate the initial syn anyhow, so isnt a big deal.
there are some other technical issues, although linux's implementation is supposed to be void of most (all?) of them. see my previously posted link for the e-mail discussion of the implementation, where the issues are discussed. the "new" proposal actually has some issues our's does not. Alan Cox posted to linux-kernel earlier mentioning some flaw. a year earlier, a magnitude better -- linux
currently, the config option defaults to off and even when enabled, the sysctl defauls to off (you need to "echo 1 >
i think i agree, because it is an extra feature that should default to off (or just be a standard nontoggleable part of the OS)
robert m love
my initials at tech9 dot net
we put this into the linux kernel in early 2.1. the define is CONFIG_SYN_COOKIES and it also needs a sysctl/proc option to be enabled (net.ipv4.tcp_syncookies)
... it works.
basically, you make your syn a function of the session's data (local and remote port, hostname, and syn) and some secret. then when the ACK returns with your SYN (and all the original data), you can inverse the function to see if it matches.
if you make this function a cryptographically strong one-way hash and use a good secret, the cookie is fairly undeterministic
in linux, we do this exactly:
our_initial_syn = one_way_hash(src.port, dst.port, src.host, dst.host, secret) + src.initial.syn;
by adding on, and not using as an argument, the initial syn, we can keep our syns properly spaced. the secret is a counter that is incremented every minute. this acts as a method for checking for old acks, too.
the TSB is not created until the final ack is received. the MSS is encoded in 3-bits in our syn (so our syn is 2^28 bits secure). no TSBs until ACK means no queue size
see http://cr.yp.to/syncookies.html for the initial discussion of implementation.
robert m love
my initials at tech9 dot net
Wrong. Windows NT is a microkernel with a full multi-threaded server setup, similar (at least in theory) to HURD. The POSIX and OS/2 subsystems (proc systems that run on the central kernel) where never expanded because no one cared. You can buy a full POSIX2 compliant layer for NT (i forgot the name..) but it is not too popular, proving my point. NT has been ported to (and released) on x86, Alpha, MIPS, and PPC, but canceled due to complete lack of public support (although Alpha was midly popular, that was more political). There are Win64 versions for Alpha (canceled) and Merced. And who knows what else MS researches, but w2k will be x86 only. NT is a fairly robust and powerful system. Robert (sagei)