The New Linux Speed Trick
Brainsur quotes a story saying "
Linux kernel 2.6 introduces improved IO scheduling that can increase speed -- "sometimes by 1,000 percent or more, [more] often by 2x" -- for standard desktop workloads, and by as much as 15 percent on many database workloads, according to Andrew Morton of Open Source Development Labs. This increased speed is accomplished by minimizing the disk head movement during concurrent reads.
"
I'm having trouble getting ACPI working in my laptop in the 2.6 kernel (it's a bad implementation on the part of my laptop). The 2.4 series used to work (sometimes) so I installed Mandrake's 2.4 kernel and 2.6 kernels on my laptop. Using 2.4.x again was like switching to a horse and buggy from a sport-cars; KDE was that much faster with the 2.6.x kernel running the show.
Whatever happened to cache. If you can anticipate the head movement surely you have already read the data before and it should be in the cache????
Dont SCSI drives do this themselves?
It seems there are two IO modes you can choose from, at boot time.
"The anticipatory scheduling is so named because it anticipates processes doing several dependent reads. In theory, this should minimize the disk head movement. Without anticipation, the heads may have to seek back and forth under several loads, and there is a small delay before the head returns for a seek to see if the process requests another read. "
"The deadline scheduler has two additional scheduling queues that were not available to the 2.4 IO scheduler. The two new queues are a FIFO read queue and a FIFO write queue. This new multi-queue method allows for greater interactivity by giving the read requests a better deadline than write requests, thus ensuring that applications rarely will be delayed by read requests."
Nice, but this is making things more complex. I admit I'll just keep all kernel settings at wherever Mandrake sets them as. Will other people play about and specialise their system for the task that it does?
- Jax
Is there any reason why the prediction code (anticipatory scheduler) and the extra queues (deadline scheduler) couldn't be combined in a single scheduler to give us the best of both worlds?
The Tao of math: The numbers you can count are not the real numbers.
Seems OK to me, that's a 10x improvement, and that was the theoretical high end example. Since they said it would commonly increase speed by 2x, and 15% for databases, it seems right in line.
I suppose that since database data is generally grouped together and read in a big chunk there's less room for improvment.
When I had an Amiga (aroung '91ish), even though It was fully multitasking, I learnt to never open any app while another was loading. If you did, you could hear the disk head moving back and forward between two sectors on disk every half second or so, slowing both app launches to a crawl. Waiting until one loaded, and launching the second was many times faster.
I've always wondered why there wasn't something in the OS to force this behaviour, Ie, making sure that App 2 access to the disk is queued until app 1 has finished. Isn't this one of the reasons Windows takes ages to boot? (many processes all competing for the one disk resource?).
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
Obviously, this was stolen from SCO. This was based on their UNIX software and was available in the baseline from 10 years ago. It only shows that Linux, once again, is not an innovator, but just copies code from SCO to achive its scalability.
>Try going outside. Find out about these things called "women".
And this would help my computer how?
I think Solaris 10 (or maybe a later version, I can't remember) is suppose to support a concept of Quality of Service applied to disk accesses.
Is anyone in the Linux world considering this ?
This is probably more applicable to the enterprise market, but surely any scheme of informing the scheduler about the expected disk transfer characteristics has to improve performance.
On the other hand, it might be just Sun trying to re-invent uses of buzz words to sell their products.
[ Monday is a terrible way to spend one seventh of your life. ]
Isn't 1000%, 11x? ..
15% = 1.15x
100% = 2x
200% = 3x
300% = 4x
900% = 10x
1000% = 11x
a % = (a+100)/100 x
Here's an older benchmark made by Andrew Morton showing the anticipatory scheduler vs the previous one.
The benchmark was made before 2.6.0, but I still think it shows the big difference from the 2.4 IO scheduler.
Quote:
Executive summary: the anticipatory scheduler is wiping the others off the map, and 2.4 is a disaster.
Stealing what? The algorithms?
The end-(Windows)-user benefits from it.
That's the price of freedom.
And any additions MS makes to the code must be made public.
So then everybody benefits.
If you mod this up, your slashdot background will turn into a beautiful sunset!
1,000 percent is 10x, but 1,000% improvement, being improvement by 10x, is 11x as good.
Just as 50% is half, but 50% improvement is three halves as good.
The Tao of math: The numbers you can count are not the real numbers.
It's great watching the "modern" computer industry discover all the toys and optimisations that where essential engineering for the systems I used to use in the '70s & '80s.
All the wonderful stuff like disk seek optimisation, interleaved memory (Even MMU came to the moden computer about 15 years after everyone else had it) were technologies that made systems stand out from each other.
Because of the speed of things these days, lots of that tech has been largely ignored, until now when we're starting to hit hard performance barriers again. Now we have to invent the technology og the '70s all over again. It's nice to see all this stuff comming back though.
The NT scheduler has been O(1) like, eh, forever.
Our kernel produces far superior performance due to providing hooks for the COM layer
Yeah, whatever. There is no COM anywhere near the NT kernel, and the latest and greatest from Microsoft, the .NET framework, isn't even based on COM anymore
Nice troll...
The cfq scheduler in the -mm (Andrew Morton) trees gives very good results in a desktop use.
With anticipatory or deadline, I'm experiencing awful skips with artsd under KDE 3.2 every time there is a heavy disk access, but it's [almost] completely gone with cfq.
To use it, compile a -mm kernel and add the 'elevator=cfq' to the kernel boot parameters through Lilo or Grub.
See this lwn article for more info.
-- don't discount flying pigs until you have good air defense
no: stealing the concept (probably by analysing the code) and writing it themselves.
so if MS make any improvements in their own implementation of the concept, then the code would not be made public and MS benefits and not everyone else.
to elaborate (and in some ways i believe this is what SCO are arguing), lets say i see an open source application that does something neat. it probably won't be patented because the author expects someone to contribute any modifications back. but lets so i don't because i'm a greedy commercial corporate and so i effectively copy the IDEAS behind the application. my code may look quite similar to theirs, but i certaintly have not infringed on the GPL (or have I - i'm no lawyer!).
so if this neat application had an "open source patent" in that anyone infringing on the patent would not be liable for millions, but rather they would be liable and forced to open up the source code of their particular implementation.
Let me start by claiming that optimizing desktop performanceis all about optimizing I/O patterns (contrary to what all Gentoo users think :P). My KDE startup is about three times as fast when I everything is in the disk cache, so it is clear where the bottleneck. (Just try logging in to KDE after boot, then log out and log in again.) A concentrated effort of
- passing on the right hints from KDE via glibc to the kernel (e.g. an madvise() call when loading executables giving the hint that probably most part of the file will be needed later on),
- trying some anticipatory reading of config files/libraries etc. from startkde where it is known that they will be needed, and that they are hopefully laying contigiously on the disk,
- optimizing disk layout for the common access patterns
would IMHO make a far bigger difference for the desktop experience than optimizing compiler flags by using gentoo or using a preemptible kernel.There has been a lot of discussion about this on the kde-optimize list (with Andrew Morton participating), so maybe we can hope that KDE 3.3 will offer some improvements.
As an aside, yes, we all hate the windows registry, but I think we should admit that for boot time optimization it is the right thing to do (having everything in one file that is layed out in one contigious block on the disk.)
AFAIK the "anticipation" bit is not so much about predicting head movement, but is more about reducing head movement. Reads
cause processes to block while waiting for the data (and can thus stall processes for long amounts of time if not scheduled appropriately), whereas writes are typically fire-and-forget. This last bit means that you can usually just queue them up, return control to the user program, and perform the actual write at some more convenient time, i.e. later. Since reads (by the same process) are usually also heavily interdependent, it is also a win to schedule them early from that POV.
That's my understanding of it.
HAND.
This messing with the I/O queue may make things interesting for the journalling process which is kind of vital to integrity. File placement could become even more important for this (and also the placing of journal/log files).
The rest seems to just effectively be a modified elevator (wait a bit before moving).
See my journal, I write things there
Make sure that you set X's "nice" value to 0. Some distros set it to something like -10 so that X is not disturbed by other procs. Under 2.4, this was a good thing. However, under 2.6, with it's superior scheduler, the kernel will keep interrupting X and you will see lagging performance. Google for it to get a better explanation.
Why is it so hot? Where am I going? What am I doing in this handbasket?
So, in turn, should the Linux community cease developing/including things that are "inspired" by Windows?
The absolute translation of logical block to head position is unknown to e.g. Linux. While it is possible to reverse engineer the physical disk layout by looking at timings, for general purpose computing this is going way too far. I think the upcoming ATA-7 hard disk standard has some more options to get information about the layout of the disk, but I'm not sure of that.
Anyway, simple sorting on LBA address will typically reduce head seeks to a large extent, resulting in most of the potential benefit. It is important however to make sure that multiple requests are available to the driver to sort.
Thanks but my father is Croatian and my Mom's French :o)
Anyway, you found out that I indeed am not a native English speaker, hence the neologistications.
Trolling using another account since 2005.
2CPU.com has a Linux kernel comparison of 2.6.4 and 2.4.25 on a SMP system with interesting results.
Elevator seeking is looking at the current request queue and bundle requests which are close together to minimise head movement. This is indeed old. IRC, Linux had it since 2.2 something.
The anticipatory scheduler tries to anticipate future requests (who would have guessed that?), and is relatively new
"Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"
Firstly, the 2.6 kernel allows pre-emptive scheduling. Supposedly it was introduced because Linus got tired of his mp3s skipping while he compiled things.
Second, Linux doesn't need a defrag utility. Linux filesystems (Ext2 and Ext3) allocate files properly, using clustering and inodes. The need to defrag comes from the bad design of FAT, which works great on a 8088 processor with tiny files on a 1Meg drive, but is terribly inefficient on anything past a 386.
Of course, there does exist a 'defrag' utility for linux. It just won't gain you much at all.
Effectively Scsi does I/O speedups. Firmware, not hardware, but so is everything. And the speedups by giving Scsi a lot to do and letting it do it in its preferred order can be significant. But Scsi cannot "see" processes - nor file systems. The OS can work out that a process is reading a file and read the next bit of the file - where Scsi would read the next bit of the disk, if it did so at all. The OS can see when you ahve reached EOF, or closed the file, and there is no point prereading.
You don't mean multiple heads on an arm. Multiple heads on an arm would all move together, and you couldn't use two at the same time - the feedback servo which keeps it on track can only respond to one track. What you mean, I think, is two groups of arms (all the arms move together). Manufacturers have looked at that but decided against it.
The arms and associated actuators are some of the most expensive parts of the drive. If you are going to double this cost, why not throw in a few more platters andd an enclosure and have twice the capacity - and twice the throughput.
Putting two actuators in the drive increases power consumption a lot, and heat as well. Both are real problems for current drives. And a "specialist" drive doesn't have the economies of scale, and could cost more than twice as much as two simple drives - which, together, have the same number of heads and twice the capacity.
The real killer is turbulence. If you have two arms on the same surface, each is flying in the wake of the other. And, unlike its own wak, the other alters dynamically, so that seeking arm 1 can perturb arm 2.
Google has it right - lots of dumb hardware, lots of clever software. What we need id filesystems whos allocation patterns are "Raid aware". Particularly Raid 0, I can see file system allocation patterns which could (in conjunction with the otimisations mentioned here) greatly improve performance.
Consciousness is an illusion caused by an excess of self consciousness.
ok, i know this is evil and all - but lets say MS decide to implement this as a concept (so without "stealing" code)... the linux community will have given them something and received (probably) nothing in return.
Not to burst your bubble, but the NT scheduler already implements predictive disk I/O concepts.
Nice that Linux is finally catching up though...
Aside from much better I/O performance, 2.6.x also has much better performance on my notebook (IBM T-series ThinkPad).
I don't know if it's due to SpeedStep support being in the kernel or what, but when I was running 2.4.x with the pre-emptible kernel patches, switching from wall power to battery power meant massive slowdowns, as though I had switched from a PIII-1GHz to a 100MHz Pentium classic. Simple commands like "ps" would take seconds to complete and screen redraws were visible. The whole system would feel like sludge. In spite of this fact, battery life was relatively poor. The combined effect (much slowed system, very short battery life) meant that it was difficult to get anything at all done on battery power.
Now with 2.6.x, when I switch to battery power, there is no perceptible slowdown whatsoever when compared to wall power, and battery life is much improved. Downside: suspending 2.6.x kills USB-uhci, so I've had to compile it as a module and hack up my suspend/resume scripts to reload it each time. But for the speed increase, it's well worth the trouble.
STOP . AMERICA . NOW
Use 'elevator=as' (or cfq, or deadline)
The anticipatory scheduler is the default for the vanilla 2.6 kernel.
Meep.
And we all would have benifited from this if they simply shared in the first place instead of spending 20-30 years "rediscovering" it.
One programmer likened the 70-80s as The Dark Ages. There were cabals and secret voodoo that people sat on and didn't share and you ended up with an ignorant masses that only thought "this is as good as it gets". Hopefully this renaissance sticks because it doesn't matter how good or cool your technology is if you bury it for 20 years without another person knowing.
The original research for anticipatory disk scheduling was done at Rice University by Sitaram Iyer and Peter Druschel and is described here.