The New Linux Speed Trick
Brainsur quotes a story saying "
Linux kernel 2.6 introduces improved IO scheduling that can increase speed -- "sometimes by 1,000 percent or more, [more] often by 2x" -- for standard desktop workloads, and by as much as 15 percent on many database workloads, according to Andrew Morton of Open Source Development Labs. This increased speed is accomplished by minimizing the disk head movement during concurrent reads.
"
I'm having trouble getting ACPI working in my laptop in the 2.6 kernel (it's a bad implementation on the part of my laptop). The 2.4 series used to work (sometimes) so I installed Mandrake's 2.4 kernel and 2.6 kernels on my laptop. Using 2.4.x again was like switching to a horse and buggy from a sport-cars; KDE was that much faster with the 2.6.x kernel running the show.
Whatever happened to cache. If you can anticipate the head movement surely you have already read the data before and it should be in the cache????
Dont SCSI drives do this themselves?
Linux Devices has an article on the 2.6 network features here http://linuxdevices.com/articles/AT7885999771.html
Open Source PVR Hardware Database
It seems there are two IO modes you can choose from, at boot time.
"The anticipatory scheduling is so named because it anticipates processes doing several dependent reads. In theory, this should minimize the disk head movement. Without anticipation, the heads may have to seek back and forth under several loads, and there is a small delay before the head returns for a seek to see if the process requests another read. "
"The deadline scheduler has two additional scheduling queues that were not available to the 2.4 IO scheduler. The two new queues are a FIFO read queue and a FIFO write queue. This new multi-queue method allows for greater interactivity by giving the read requests a better deadline than write requests, thus ensuring that applications rarely will be delayed by read requests."
Nice, but this is making things more complex. I admit I'll just keep all kernel settings at wherever Mandrake sets them as. Will other people play about and specialise their system for the task that it does?
- Jax
Is there any reason why the prediction code (anticipatory scheduler) and the extra queues (deadline scheduler) couldn't be combined in a single scheduler to give us the best of both worlds?
The Tao of math: The numbers you can count are not the real numbers.
When I had an Amiga (aroung '91ish), even though It was fully multitasking, I learnt to never open any app while another was loading. If you did, you could hear the disk head moving back and forward between two sectors on disk every half second or so, slowing both app launches to a crawl. Waiting until one loaded, and launching the second was many times faster.
I've always wondered why there wasn't something in the OS to force this behaviour, Ie, making sure that App 2 access to the disk is queued until app 1 has finished. Isn't this one of the reasons Windows takes ages to boot? (many processes all competing for the one disk resource?).
I've actually found that on my machine, a pretty much standard desktop, response is a lot slower on 2.6.5 than 2.4.22. Not sure if I got something set wrong in the compile, but moving the mouse and stuff like that seems a lot jerkier under load. I use a USB mouse and keyboard, so maybe that's part of it. Anyone else seen similiar?
Obviously, this was stolen from SCO. This was based on their UNIX software and was available in the baseline from 10 years ago. It only shows that Linux, once again, is not an innovator, but just copies code from SCO to achive its scalability.
is accomplished by minimizing the disk head movement
I was always under the impression that modern hard drive designs hide the physical disk bits and pieces from the PC. So how can software predict where the heads are?
Karma? Hey I just call it as I see it.
>Try going outside. Find out about these things called "women".
And this would help my computer how?
I think Solaris 10 (or maybe a later version, I can't remember) is suppose to support a concept of Quality of Service applied to disk accesses.
Is anyone in the Linux world considering this ?
This is probably more applicable to the enterprise market, but surely any scheme of informing the scheduler about the expected disk transfer characteristics has to improve performance.
On the other hand, it might be just Sun trying to re-invent uses of buzz words to sell their products.
[ Monday is a terrible way to spend one seventh of your life. ]
Here's an older benchmark made by Andrew Morton showing the anticipatory scheduler vs the previous one.
The benchmark was made before 2.6.0, but I still think it shows the big difference from the 2.4 IO scheduler.
Quote:
Executive summary: the anticipatory scheduler is wiping the others off the map, and 2.4 is a disaster.
Stealing what? The algorithms?
The end-(Windows)-user benefits from it.
That's the price of freedom.
And any additions MS makes to the code must be made public.
So then everybody benefits.
If you mod this up, your slashdot background will turn into a beautiful sunset!
It's great watching the "modern" computer industry discover all the toys and optimisations that where essential engineering for the systems I used to use in the '70s & '80s.
All the wonderful stuff like disk seek optimisation, interleaved memory (Even MMU came to the moden computer about 15 years after everyone else had it) were technologies that made systems stand out from each other.
Because of the speed of things these days, lots of that tech has been largely ignored, until now when we're starting to hit hard performance barriers again. Now we have to invent the technology og the '70s all over again. It's nice to see all this stuff comming back though.
The NT scheduler has been O(1) like, eh, forever.
Our kernel produces far superior performance due to providing hooks for the COM layer
Yeah, whatever. There is no COM anywhere near the NT kernel, and the latest and greatest from Microsoft, the .NET framework, isn't even based on COM anymore
Nice troll...
The cfq scheduler in the -mm (Andrew Morton) trees gives very good results in a desktop use.
With anticipatory or deadline, I'm experiencing awful skips with artsd under KDE 3.2 every time there is a heavy disk access, but it's [almost] completely gone with cfq.
To use it, compile a -mm kernel and add the 'elevator=cfq' to the kernel boot parameters through Lilo or Grub.
See this lwn article for more info.
-- don't discount flying pigs until you have good air defense
no: stealing the concept (probably by analysing the code) and writing it themselves.
so if MS make any improvements in their own implementation of the concept, then the code would not be made public and MS benefits and not everyone else.
to elaborate (and in some ways i believe this is what SCO are arguing), lets say i see an open source application that does something neat. it probably won't be patented because the author expects someone to contribute any modifications back. but lets so i don't because i'm a greedy commercial corporate and so i effectively copy the IDEAS behind the application. my code may look quite similar to theirs, but i certaintly have not infringed on the GPL (or have I - i'm no lawyer!).
so if this neat application had an "open source patent" in that anyone infringing on the patent would not be liable for millions, but rather they would be liable and forced to open up the source code of their particular implementation.
Let me start by claiming that optimizing desktop performanceis all about optimizing I/O patterns (contrary to what all Gentoo users think :P). My KDE startup is about three times as fast when I everything is in the disk cache, so it is clear where the bottleneck. (Just try logging in to KDE after boot, then log out and log in again.) A concentrated effort of
- passing on the right hints from KDE via glibc to the kernel (e.g. an madvise() call when loading executables giving the hint that probably most part of the file will be needed later on),
- trying some anticipatory reading of config files/libraries etc. from startkde where it is known that they will be needed, and that they are hopefully laying contigiously on the disk,
- optimizing disk layout for the common access patterns
would IMHO make a far bigger difference for the desktop experience than optimizing compiler flags by using gentoo or using a preemptible kernel.There has been a lot of discussion about this on the kde-optimize list (with Andrew Morton participating), so maybe we can hope that KDE 3.3 will offer some improvements.
As an aside, yes, we all hate the windows registry, but I think we should admit that for boot time optimization it is the right thing to do (having everything in one file that is layed out in one contigious block on the disk.)
AFAIK the "anticipation" bit is not so much about predicting head movement, but is more about reducing head movement. Reads
cause processes to block while waiting for the data (and can thus stall processes for long amounts of time if not scheduled appropriately), whereas writes are typically fire-and-forget. This last bit means that you can usually just queue them up, return control to the user program, and perform the actual write at some more convenient time, i.e. later. Since reads (by the same process) are usually also heavily interdependent, it is also a win to schedule them early from that POV.
That's my understanding of it.
HAND.
Alternatively, have multiple read-heads on a single arm. 3 would be a good number. The idea here would be that you could pre-seek either side of the disk, before finishing a read by the currently-active arm.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Doesn't this involve a green marker, and tracing along the edge of the hard drive? Faster and less distortion?
Your comment is meaningless gibberish studded with technobabble.
I believe you, you must really work at Microsoft.
This messing with the I/O queue may make things interesting for the journalling process which is kind of vital to integrity. File placement could become even more important for this (and also the placing of journal/log files).
The rest seems to just effectively be a modified elevator (wait a bit before moving).
See my journal, I write things there
Zealotry is all fine and dandy, but delusional zealotry just lands people in jail.
You need help, buddy.
Would it not be possible to write a very basic adaptive network that "learns" what the best values for these parameters are for each individual machine, based on a history of its workload?
Invoicing, Time Tracking, Reporting
So, in turn, should the Linux community cease developing/including things that are "inspired" by Windows?
It's kind of sad that the free software advocates sometimes get so carried off by their pathological hatred for Microsoft and corporations that they don't see that they're about to become "the enemy" themselves.
Free is free. If you start to restrict the use and availability of your code by requiring the release of any modifications to the public, it's not free code anymore - no matter what RMS says.
The owls are not what they seem
Thanks but my father is Croatian and my Mom's French :o)
Anyway, you found out that I indeed am not a native English speaker, hence the neologistications.
Trolling using another account since 2005.
2CPU.com has a Linux kernel comparison of 2.6.4 and 2.4.25 on a SMP system with interesting results.
Elevator seeking is looking at the current request queue and bundle requests which are close together to minimise head movement. This is indeed old. IRC, Linux had it since 2.2 something.
The anticipatory scheduler tries to anticipate future requests (who would have guessed that?), and is relatively new
"Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"
Desktop Linux needs a scheduling policy specific to interactivity. I guess this may happen the day a decent interface gets slapped on the Linux base. Until then, we dance the same dance - every release is faster than the previous one by the benchmarks, and feels more horrid than the previous one.
Surprise, the Mac has the same reactivity problem now thanks to its Unix (Mach) kernel, while the previous Mac OS 9 crashed regularly, couldn't multitask, but has a much snappier user-experience. Apple has been adressing this issue - which they recognize- for 2 years now, and have almost but not quite fixed it with their current Panther release.
It is time we found a way to benchmark a user experience in order to prevent over-optimisation for number-crunching.
Most of my posts get marked down as trolls - think hard: How can you solve a problem if you refuse to admit it exists ?
This is not a signature.
Yeah, the same thing happens under Windows if you read from CD-ROM. The whole thing just slows to a crawl if you try to read two files at once. I'd assume it's a hardware problem, (long seek times, large error margins) not necessarily Windows' fault, but I don't use CDs much anymore (hooray for ethernet and huge hard drives) so I don't know.
Of course, this raises the point that aligning the data on a game CD or DVD for a console is a science in itself. PC game development is easy in comparison! (plonk everything on the hard drive)
phil
...that the Red Hat "kernel development systems engineer"'s name is Stephen Tweedie, not Tweed :)
Firstly, the 2.6 kernel allows pre-emptive scheduling. Supposedly it was introduced because Linus got tired of his mp3s skipping while he compiled things.
Second, Linux doesn't need a defrag utility. Linux filesystems (Ext2 and Ext3) allocate files properly, using clustering and inodes. The need to defrag comes from the bad design of FAT, which works great on a 8088 processor with tiny files on a 1Meg drive, but is terribly inefficient on anything past a 386.
Of course, there does exist a 'defrag' utility for linux. It just won't gain you much at all.
No, this is not the elevator algorithm. This is an anticipatory algorithm that pre-queues reads that it expects the application to do in the future. Linux already has the elevator algorithm - had it before Windows, I believe.
Consciousness is an illusion caused by an excess of self consciousness.
However, I wouldn't even try that on RedHat or Mandrake without having the .config file and a list of distribution specific patches.
This was on a Celeron 1GHz laptop, and honestly, I couldn't tell the difference in speed beyond any custom compile. Custom meaning unnecessary device drivers are removed, and the ones that I need are compiled in (as opposed to remaining modularized).
Kinetic stupidity has a new brand leader: Allen Zadr.
so how does this effect IDE drives in terms of IO read/write accuracy? We use SCSI drives for low level mass data processing and mining because what you write to the disk is guaranteed to be what you can read from the disk in the future.
IDE disks don't have the same guarantee. Does the new 2.6 kernel improve this?
I also wonder if this reduces hard drive wear for longer lifetimes....
Veni Vidi Vici
M.
--
Monete Italiane
And if you look above to this post, you can all see a great deal of decent explanations of what 1000% increase actually means (11%).
Kinetic stupidity has a new brand leader: Allen Zadr.
ok, i know this is evil and all - but lets say MS decide to implement this as a concept (so without "stealing" code)... the linux community will have given them something and received (probably) nothing in return.
Not to burst your bubble, but the NT scheduler already implements predictive disk I/O concepts.
Nice that Linux is finally catching up though...
Aside from much better I/O performance, 2.6.x also has much better performance on my notebook (IBM T-series ThinkPad).
I don't know if it's due to SpeedStep support being in the kernel or what, but when I was running 2.4.x with the pre-emptible kernel patches, switching from wall power to battery power meant massive slowdowns, as though I had switched from a PIII-1GHz to a 100MHz Pentium classic. Simple commands like "ps" would take seconds to complete and screen redraws were visible. The whole system would feel like sludge. In spite of this fact, battery life was relatively poor. The combined effect (much slowed system, very short battery life) meant that it was difficult to get anything at all done on battery power.
Now with 2.6.x, when I switch to battery power, there is no perceptible slowdown whatsoever when compared to wall power, and battery life is much improved. Downside: suspending 2.6.x kills USB-uhci, so I've had to compile it as a module and hack up my suspend/resume scripts to reload it each time. But for the speed increase, it's well worth the trouble.
STOP . AMERICA . NOW
It is new with respect to 2.4.x. The anticipatory scheduler was introduced 2.5.x-mm and made its way into the kernel by the time 2.6 was released.
Program Intellivision!
Use 'elevator=as' (or cfq, or deadline)
The anticipatory scheduler is the default for the vanilla 2.6 kernel.
Meep.
The point of the new scheduler(s) is that most access to the disk by a process is sequential (i.e. many blocks at a time), so if another process wants to access some other part of the disk, it most often pays off to let that process wait for a while before serving it, since the original process most likely will want to get more data from your current block.
That way, you don't need to move the head nearly as much as if you responded directly to the other process.
Robert Love has written an excellent article about the new schedulers here: I/O Schedulers
Meep.
If you compile GLIBC with NPTL support you'll see even more of the new kernel in action. I quote from LinuxJournal.com,
NPTL brings an eight-fold improvement over its predecessor. Tests conducted by its authors have shown that Linux, with this new threading, can start and stop 100,000 threads simultaneously in about two seconds. This task took 15 minutes on the old threading model.
You are right, there is nothing new there... if they were talking about elevator seeking style movement. This article is about Linux making use of Anticipatory Scheduling which is completely different and quite new.
Do you ever question what "boot" means"? When a linux system lets you log in, EVERYTHING is already started and running. When Windows shows you the desktop, there is still a ton of stuff getting started in the background (or the foreground even) and it's still unusable. Windows doesn't start any faster, it just shows you pretty pictures sooner.
My blog. Good stuff (when I remember to update it). Read it.
And we all would have benifited from this if they simply shared in the first place instead of spending 20-30 years "rediscovering" it.
One programmer likened the 70-80s as The Dark Ages. There were cabals and secret voodoo that people sat on and didn't share and you ended up with an ignorant masses that only thought "this is as good as it gets". Hopefully this renaissance sticks because it doesn't matter how good or cool your technology is if you bury it for 20 years without another person knowing.
The original research for anticipatory disk scheduling was done at Rice University by Sitaram Iyer and Peter Druschel and is described here.
If you don't care about last access times on your files, then you should consider mounting your filesystems with the noatime mount flag as in this /etc/fstab line:
Reading a file under noatime means that the kernel does not need to go back and update the last access time field of that file's inode. Sure, multiple reads over a span of a few seconds will only cause the in-core inode to be modified, but eventually that modified inode must be flushed out to disk. Why cause an extra write to the disk for a feature that you might not care about?
For example: think about those cron jobs / progs that scan the file tree (tmpwatch, updatedb, etc.). Unless you mount with the noatime option, your kernel must at least update the last access time fields of every directory's inode! Think about those /etc files that are
frequently read (hosts, hosts.allow, DIR_COLORS,
resolv.conf, etc.) or the dynamic shared libs
(libc.so.6, ld-linux.so.2, libdl.so.2, etc.)
that are frequently used by progs. Why
waste write-ops updating their last access
time fields?
Yes, the last access time field has some uses. However, the the cost of updating those last access timestamps, IMHO, is seldom worth the extra disk ops.
There are other advantages to using the noatime mount option ... however to
wind up this posting I'll just say that I
always mount my ext3 filesystems with the
noatime mount flag. I recommend that
you consider looking into this option if you
don't use it already.
chongo (was here)
I know there is a boot-time switch for changing the I/O scheduler, but I still believe you are stuck with one for all devices. How about using different algorithms for different partitions? There is quite a lot of difference between a database device, a filesystem holding binaries, shared libaries, /tmp, spool directories etc. etc. etc. When I/O schedulers are so different in their theoretical foundations, why do you have to choose only one?
This should be a mount option, not a boot option.
What is the sound of one hand clapping?
cat