They were eminently practical suggestions. All sorts of "operation completed" virtualizations already happen.
When you fire off a print job, there's no particular reason for a user to stare at a progress dialog as the document is formatted and spooled to a print queue, s/he just wants to know that the operation has been submitted and needs no further user interaction.
OS filesystems commonly have write buffering at numerous levels. Working with floppies or other slow media is slick under Linux because operations appear to complete immediately as opposed to blocking as under Windows.
Any technique which can reduce the user's waiting time without compromising the integrity of their data can make for a better and more productive user experience.
Well Intel is already encountering heat problems which limit how fast they can crank the clockspeed. Hyperthreading is a moderately successful attempt to make use of the available execution units on the chip which would otherwise sit idle. It's also not so new and untested, it has been implemented but not enabled on earlier P4 steppings.
Athlon and Athlon64 are generally better able to make use of their execution units, and wouldn't benefit from HT as much as P4/Xeon.
Fair enough, but sometimes it's nice to be able to open for editing a new copy of a file which exists somewhere else. For example, if you commonly run with an auto-save feature enabled then it may not work to open an original template file and eventually save-as your destination file.
I like the idea of having a reasonably full set of file and directory management functions available from the various file dialogs, as long as some care is taken to keep them unobtrusive. If KDE can reach a good balance, then so can GTK+.
The point is, you don't actually want what you're asking for. It's easy to say "I want the entire rest of the machine to halt and service my foreground task until I change to a different foreground task", but that has all sorts of bad consequences.
It's not because any protocols or interfaces are badly designed, it's because you're asking for a draconian approach that interferes with the normal functioning of the system when other effective approaches are available.
An example: When you go out jogging, you don't want your liver and kidneys to shut down just because they're not directly related to making you run faster.
Okay maybe a better example is you're downloading the latest ISO's from a relatively slow internet connection. Say for every ten seconds of download time it takes the system a tenth of a second to save it to disk. Surely you don't want to freeze your download by forbidding even an occasional 100ms timeslice to do the disk save.
Likewise, you might have a huge print job which the printer can only accept at a slow rate, but which is a trivial load to the main CPU. There's no reason to freeze the print job when doing so won't make the slightest difference to foreground GUI responsiveness.
A wonderfully lucid explanation, someone pls mod parent up.
Does 2.6 have SMP load balancing based on priority? I'm wondering about the case of a system where CPU 0 has ten high priority processes and CPU 1 has ten low priority processes. Presumably both CPU's could continue crunching away on their respective runqueues, but you'd think it'd be better to distribute the high priority processes across available CPU's.
Threads may represent daemons / drivers for devices which need immediate attention and run at high priorities or whose priorities are adjusted on the fly as needed by the OS.
You wouldn't want SETI or Folding@home or even animated.GIF's in your browser sucking up machine cycles needed to satisfy system devices or keep your MP3's playing without interruption. Under the previous 10ms 'jiffy' setting, it could take a long time to chug through a round-robin of a large set of threads.
I'm not saying it couldn't be made to work, you'd just have terrible latency and probably bad throughput too.
We've been speaking of kernel scheduling, but a given GUI implementation can also have a dramatic effect on whether delays in one part of the system will result in sluggish UI response. IIRC, KDE is largely single-threaded and there are lots of cases where it will be momentarily hung-up waiting on some global resource. Maybe someone could elaborate.
Sometimes a subsystem really wants attention NOW, not after another 172 of a total 187 threads have been given a chance to execute in the round-robin. Your solution essentially ignored the management of N threads and then claimed credit for the savings yielded by not managing those N threads.
Hell, I can build a 100mpg car if my car doesn't have to carry passengers [seats, sturdy chassis] or handle bad roads [full suspension] or accelerate [multi-gear transmission] or slow down [brakes] or turn [steering rack]...
Nobody suggested constant size timeslices. I just observed that while the algorithm above varied the timeslices, it didn't attempt to service higher priority processes sooner and therefore wasn't a viable solution.
If you claim that producing an O(1) solution is trivial, then the example you present must solve the problem under consideration!
FWIW, it wasn't clear to me how any of the methods presented here or at Ars would prevent thread starvation from lower priority threads that still appreciate being scheduled now and then. Maybe that's just an implementation detail of the O(1) function which calculates priorities.
This solution has no notion of higher priority processes taking precedence over lower priority ones, which is the main point of scheduling. It adjusts the relative timeslice of each process scheduled as it round robins through the [solitary] linked list, but it makes no attempt to schedule higher priority ones first.
I've been running a pair of unmodified Palomino XP 2000+ in a Tiger MPX for two years. Other non-MP Athlons can be run SMP using pencil bridge tricks. Try 2cpu.com and other sources.
Of course, this AMD motherboard chipset is ancient now -- you might be better off with a low-end Opteron.
Actually an interesting proposition. IIRC, there were tests on P-II with disabled caches which lost maybe 30% in performance but became more amenable to overclocking. An Athlon64 2200+ for $20 could be an attractive chip for some.
"A processor that is twice as fast is always better than two processors."
True most of the time, but two processors have twice the cache and may suffer less context switch penalty, depending on how multi-threaded the application.
"That's why, so far, graphics cards and PCs have stuck to one chip. Yes, there are exceptions (Vodoo5, Opteron/Xeon/G4/G5), but they never sell as well as the cheaper 1p systems."
Cheaper systems sell better largely because they're cheaper.
Well, I spoke too quickly. There're conflicting assertions all over the web, but according to AMD's website:
All Opterons have three HyperTransport links. One of the 2xx series' HT links is coherent, allowing it to talk to another CPU. All three of the 8xx's are coherent, while none of the 1xx's are. The Athlon64 and AthlonFX each have one HT link, which is non-coherent.
Makes ya wonder whether a company such as Powerleap might come out with a CPU adapter to support it. For a long time the Athlon MP series offered the only affordable SMP solution, especially if like me you found a pair of Athlon XPs which worked happily in SMP mode.
That's all true, but different chipsets have different speed HyperTransport connections, which may have some impact depending on what the system is doing. IIRC, the AGP interfaces may need some work.
They were eminently practical suggestions. All sorts of "operation completed" virtualizations already happen.
When you fire off a print job, there's no particular reason for a user to stare at a progress dialog as the document is formatted and spooled to a print queue, s/he just wants to know that the operation has been submitted and needs no further user interaction.
OS filesystems commonly have write buffering at numerous levels. Working with floppies or other slow media is slick under Linux because operations appear to complete immediately as opposed to blocking as under Windows.
Any technique which can reduce the user's waiting time without compromising the integrity of their data can make for a better and more productive user experience.
Very interesting post, wish I had me some mod points. The point about CPU speeds eclipsing storage access times is especially relevant.
Well Intel is already encountering heat problems which limit how fast they can crank the clockspeed. Hyperthreading is a moderately successful attempt to make use of the available execution units on the chip which would otherwise sit idle. It's also not so new and untested, it has been implemented but not enabled on earlier P4 steppings.
Athlon and Athlon64 are generally better able to make use of their execution units, and wouldn't benefit from HT as much as P4/Xeon.
Fixed link here.
I don't think the initial mockups show all that the new GTK+ API can do, but yeah the KDE dialog is well implemented.
Fair enough, but sometimes it's nice to be able to open for editing a new copy of a file which exists somewhere else. For example, if you commonly run with an auto-save feature enabled then it may not work to open an original template file and eventually save-as your destination file.
I like the idea of having a reasonably full set of file and directory management functions available from the various file dialogs, as long as some care is taken to keep them unobtrusive. If KDE can reach a good balance, then so can GTK+.
The point is, you don't actually want what you're asking for. It's easy to say "I want the entire rest of the machine to halt and service my foreground task until I change to a different foreground task", but that has all sorts of bad consequences.
It's not because any protocols or interfaces are badly designed, it's because you're asking for a draconian approach that interferes with the normal functioning of the system when other effective approaches are available.
An example: When you go out jogging, you don't want your liver and kidneys to shut down just because they're not directly related to making you run faster.
Okay maybe a better example is you're downloading the latest ISO's from a relatively slow internet connection. Say for every ten seconds of download time it takes the system a tenth of a second to save it to disk. Surely you don't want to freeze your download by forbidding even an occasional 100ms timeslice to do the disk save.
Likewise, you might have a huge print job which the printer can only accept at a slow rate, but which is a trivial load to the main CPU. There's no reason to freeze the print job when doing so won't make the slightest difference to foreground GUI responsiveness.
No, he's not surfing any lists. He's grabbing an element off the head and stuffing another at the tail -- both O(1) operations.
The only challenge then is sorting based on priority, which at this point in my post-Christmas drunken-ness is probably beyond my comprehension. :-D
You're in luck: The priority directly indexes an array to find the appropriate linked list and no sorting is required. Party on!
A wonderfully lucid explanation, someone pls mod parent up.
Does 2.6 have SMP load balancing based on priority? I'm wondering about the case of a system where CPU 0 has ten high priority processes and CPU 1 has ten low priority processes. Presumably both CPU's could continue crunching away on their respective runqueues, but you'd think it'd be better to distribute the high priority processes across available CPU's.
My scheme? Try Linus and Bill et al.
.GIF's in your browser sucking up machine cycles needed to satisfy system devices or keep your MP3's playing without interruption. Under the previous 10ms 'jiffy' setting, it could take a long time to chug through a round-robin of a large set of threads.
Threads may represent daemons / drivers for devices which need immediate attention and run at high priorities or whose priorities are adjusted on the fly as needed by the OS.
You wouldn't want SETI or Folding@home or even animated
I'm not saying it couldn't be made to work, you'd just have terrible latency and probably bad throughput too.
We've been speaking of kernel scheduling, but a given GUI implementation can also have a dramatic effect on whether delays in one part of the system will result in sluggish UI response. IIRC, KDE is largely single-threaded and there are lots of cases where it will be momentarily hung-up waiting on some global resource. Maybe someone could elaborate.
Sometimes a subsystem really wants attention NOW, not after another 172 of a total 187 threads have been given a chance to execute in the round-robin. Your solution essentially ignored the management of N threads and then claimed credit for the savings yielded by not managing those N threads.
Hell, I can build a 100mpg car if my car doesn't have to carry passengers [seats, sturdy chassis] or handle bad roads [full suspension] or accelerate [multi-gear transmission] or slow down [brakes] or turn [steering rack]...
Nobody suggested constant size timeslices. I just observed that while the algorithm above varied the timeslices, it didn't attempt to service higher priority processes sooner and therefore wasn't a viable solution.
If you claim that producing an O(1) solution is trivial, then the example you present must solve the problem under consideration!
FWIW, it wasn't clear to me how any of the methods presented here or at Ars would prevent thread starvation from lower priority threads that still appreciate being scheduled now and then. Maybe that's just an implementation detail of the O(1) function which calculates priorities.
This solution has no notion of higher priority processes taking precedence over lower priority ones, which is the main point of scheduling. It adjusts the relative timeslice of each process scheduled as it round robins through the [solitary] linked list, but it makes no attempt to schedule higher priority ones first.
Thank you, this draft was immensely more interesting than the Ars version.
I'm guessing that the NUMA extensions added some complexity to the load balancing as it moves threads between runqueues?
I've been running a pair of unmodified Palomino XP 2000+ in a Tiger MPX for two years. Other non-MP Athlons can be run SMP using pencil bridge tricks. Try 2cpu.com and other sources.
Of course, this AMD motherboard chipset is ancient now -- you might be better off with a low-end Opteron.
Actually an interesting proposition. IIRC, there were tests on P-II with disabled caches which lost maybe 30% in performance but became more amenable to overclocking. An Athlon64 2200+ for $20 could be an attractive chip for some.
The 68000 may've been fully 32-bit, but it lacked 32-bit multiply finally added with the 68020. Not a trivial omission for a CISC machine.
AFAIK, the 939 pin socket will be dual channel. Or 128-bit single channel, which isn't quite the same thing.
"A processor that is twice as fast is always better than two processors."
True most of the time, but two processors have twice the cache and may suffer less context switch penalty, depending on how multi-threaded the application.
"That's why, so far, graphics cards and PCs have stuck to one chip. Yes, there are exceptions (Vodoo5, Opteron/Xeon/G4/G5), but they never sell as well as the cheaper 1p systems."
Cheaper systems sell better largely because they're cheaper.
For legacy 16-bit code definitely. But Pentium Pro was the clear winner on 32-bit.
Well, I spoke too quickly. There're conflicting assertions all over the web, but according to AMD's website:
All Opterons have three HyperTransport links. One of the 2xx series' HT links is coherent, allowing it to talk to another CPU. All three of the 8xx's are coherent, while none of the 1xx's are. The Athlon64 and AthlonFX each have one HT link, which is non-coherent.
Makes ya wonder whether a company such as Powerleap might come out with a CPU adapter to support it. For a long time the Athlon MP series offered the only affordable SMP solution, especially if like me you found a pair of Athlon XPs which worked happily in SMP mode.
No, Athlon64 / FX / Opteron all have the same 32-bit and 64-bit processor modes.
That's all true, but different chipsets have different speed HyperTransport connections, which may have some impact depending on what the system is doing. IIRC, the AGP interfaces may need some work.