Multicore Requires OS Rework, Windows Expert Says
alphadogg writes "With chip makers continuing to increase the number of cores they include on each new generation of their processors, perhaps it's time to rethink the basic architecture of today's operating systems, suggested Dave Probert, a kernel architect within the Windows core operating systems division at Microsoft. The current approach to harnessing the power of multicore processors is complicated and not entirely successful, he argued. The key may not be in throwing more energy into refining techniques such as parallel programming, but rather rethinking the basic abstractions that make up the operating systems model. Today's computers don't get enough performance out of their multicore chips, Probert said. 'Why should you ever, with all this parallel hardware, ever be waiting for your computer?' he asked. Probert made his presentation at the University of Illinois at Urbana-Champaign's Universal Parallel Computing Research Center."
Oh please, this has been coming for years now. Why has it taken so long for the OS designers to get with the program? We've had multi-CPU servers for literally decades.
'Why should you ever, with all this parallel hardware, ever be waiting for your computer?'
Because I/O is always going to be slow.
Sent from my PDP-11
The problem is that most (if not all) peripheral hardware is not parallel in many senses. Hardware in today's computers is serial: You access one device, then another, then another. There are some cases (such as a few good emulators) which use muti-threaded emulation (sound in one thread, graphics in another) but fundamentally the biggest performance kill is the final IRQs that get called to process data. The structure of modern day computers must change to take advantage of multicore systems.
Isn't this the reason for Apple to have rolled out GrandCentral in Snow Leopard? If so, it seems it's not THAT hard to do - at least not that hard for a non-Windows OS.
...the implementation sucks.
Why for example does Windows Explorer decide to freeze ALL network connections when a single URN isn't quickly resolved? Why is it that when my USB drive wakes up, all explorer windows freeze? If you are trying to tell me there's no way using the current abstractions to implement this I say you're mad. For that matter when a copy or move fails in Explorer, why can't I simply resume it once I've fixed whatever the problem is. You're left piecing together what has and hasn't been moved. File requests make up a good deal of what we're waiting for. It's not the bus or the drives that are usually the limitation. It's the shitty coding. I can live with a hit at startup. I can live with delays if I have to eat into swap. But I'm sick and tired of basic functionality being missing or broken.
These posts express my own personal views, not those of my employer
Why should you ever, with all this parallel hardware, ever be waiting for your computer?' he asked.
Because it might be waiting for I/O.
That's no reason for the entire GUI to freeze on Windows when you insert a CD.
You wait because some programmer thought it was more important to have animated menus than a fast algorithm. You wait because someone was told "computers have lots of disk space." You wait because the engineers never tested their database on a large enough scale. You wait because programmers today are taught to write everything themselves, and to simply expect new hardware to make their mistakes irrelevant.
You do not have a moral or legal right to do absolutely anything you want.
Fist post!
I come to /. to read tech news... not so see people fisting.
Interesting.
Microsoft should go back and read some of the literature on parallel computing from 20-30 years ago. Machines with many cores are nothing new. And Microsoft could have designed for it if they hadn't been busy re-implementing a bloated version of VMS.
Windows explorer sucks. It always just abandons copies after a fail - even if you're moving thousands of files over a network. Yes, you're left wondering which files did/didn't make it. It's actually easier to sometimes copy all the files you want to shift locally, then move the copy, so that you can resume after a fail. It's laughable you have to do this, however.
But it's not a concurrency issue, and neither, really, are the first 2 problems you mention. They're also down to Windows Explorer sucking.
I come to /. to read tech news... not so see people fisting.
Well, I came here to see the fisting. And frankly, so far this site has been a real disappointment.
I don't care if it's 90,000 hectares. That lake was not my doing.
Are you running a 9 year old version of OSX too, or are you comparing a two generation old Windows version to a nice new Mac version? It really sounds like you are comparing apples (snicker) to oranges. After all, both Vista and Windows 7 have no problem running for a long, long time between reboots and don't get slow during that time.
The Problem with Threads (UC Berkeley's Prof Edward Lee)
How to Solve the Parallel Programming Crisis
Half a Century of Crappy Computing
The computer industry will have to wake up to reality sooner or later. We must reinvent the computer; there is no getting around this. The old paradigms from the 20th century do not work anymore because they were not designed for parallel processing.
I'm not sure I get it - GCD just looks like a threadpool library. Windows has had a built-in threadpool API that's been available since Windows 2000, and it seems to do pretty much the same thing as GCD.
Why should you ever, with all this parallel hardware, ever be waiting for your computer?'
For a lot of problems, for the same reason that some guy who just married 8 brides will still have to wait for his baby.
A big problem is the event-driven model of most user interfaces. Almost anything that needs to be done is placed on a serial event queue, which is then processed one event at a time. This prevents race conditions within the GUI, but at a high cost. Both the Mac and Windows started that way, and to a considerable extent, they still work that way. So any event which takes more time than expected stalls the whole event queue. There are attempts to fix this by having "background" processing for events known to be slow, but you have to know which ones are going to be slow in advance. Intermittently slow operations, like an DNS lookup or something which infrequently requires disk I/O, tend to be bottlenecks.
Most languages still handle concurrency very badly. C and C++ are clueless about concurrency. Java and C# know a little about it. Erlang and Go take it more seriously, but are intended for server-side processing. So GUI programmers don't get much help from the language.
In particular, in C and C++, there's locking, but there's no way within the language to even talk about which locks protect which data. Thus, concurrency can't be analyzed automatically. This has become a huge mess in C/C++, as more attributes ("mutable", "volatile", per-thread storage, etc.) have been bolted on to give some hints to the compiler. There's still race condition trouble between compilers and CPUs with long look-ahead and programs with heavy concurrency.
We need better hard-compiled languages that don't punt on concurrency issues. C++ could potentially have been fixed, but the C++ committee is in denial about the problem; they're still in template la-la land, adding features few need and fewer will use correctly, rather than trying to do something about reliability issues. C# is only slightly better; Microsoft Research did some work on "Polyphonic C#", but nobody seems to use that. Yes, there are lots of obscure academic languages that address concurrency. Few are used in the real world.
Game programmers have more of a clue in this area. They're used to designing software that has to keep the GUI not only updated but visually consistent, even if there are delays in getting data from some external source. Game developers think a lot about systems which look consistent at all times, and come gracefully into synchronization with outside data sources as the data catches up. Modern MMORPGs do far better at handling lag than browsers do. Game developers, though, assume they own most of the available compute resources; they're not trying to minimize CPU consumption so that other work can run. (Nor do they worry too much about not running down the battery, the other big constraint today.)
Incidentally, modern tools for hardware design know far more about timing and concurrency than anything in the programming world. It's quite possible to deal with concurrency effectively. But you pay $100,000 per year per seat for the software tools used in modern CPU design.
I wish I could mod you higher than +5, you just summed up some of the things that bother me most about the OS that is somehow still the most popular desktop OS in the world.
.
To anyone using Windows (XP, Vista or 7) right now, go ahead and open up an Explorer window, and type in ftp:// followed by any url.
Even when it's a name that obviously won't resolve, or an ip of your very own local network of a machine that just doesn't exist, this'll hang your Explorer window for a couple of solid seconds. If you're a truly patient person, try doing that with a name that does resolve, like ftp://microsoft.com . Better yet, try stopping it.... say goodbye to your explorer.exe
This is one of the worst user experiences possible, all for a mundane task like using ftp. And this has been present in Windows for what, a decade?
+1 Funny Signature
Actually most current monolithic kernels are multithreaded, so they can have one thread working on reading that CD, while another threads handles user input, etc. The only difference from microkernels is that it's all in a single address space.
I noticed the same on my mac. With a set of eight CPU graph meters in the menu bar, they're almost always evenly pitched anywhere from idle to 100%, with a few notable exceptions like second life, some photoshop filters, and firefox of all things.
When booted into Win, more often than not I have two cores pegged high, and the others idle. Getting even use out of all cores is the exception, not the rule.
This is pretty much completely down to the application mix. Windows has no trouble whatsoever scheduling processes and threads to max out 8 (or 16, or whatever) CPUs, but if the applications are only coded to have, say, 1 or 2 "processing" threads, then there's nothing the OS can do to change that.
For that matter when a copy or move fails in Explorer, why can't I simply resume it once I've fixed whatever the problem is.
You can as of Vista.
or basically replaces windows with something else.
comment first, facts later. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm
The trick with GCD is that it is somewhat more high-level than a simple thread pool - it operates in terms of tasks, not threads. The difference is that tasks have explicit dependencies on other tasks - this lets scheduler be smarter about allocating cores.
Transaction is copying some files, failing in the middle, and not rolling back those copied over??
Hint. It's not transaction. It's just a bad piece of software that fails badly at doing it's basic job. Handling files.
There are two types of people in the world: Those who crave closure
It seems you are severely underestimating what GCD means to the application developer. I strongly suggest you read parts 12 and 13 of John Siracusa's excellent review very carefully. As Siracusa says,
Those with some multithreaded programming experience may be unimpressed with the GCD. So Apple made a thread pool. Big deal. They've been around forever. But the angels are in the details. Yes, the implementation of queues and threads has an elegant simplicity, and baking it into the lowest levels of the OS really helps to lower the perceived barrier to entry, but it's the API built around blocks that makes Grand Central Dispatch so attractive to developers. Just as Time Machine was "the first backup system people will actually use," Grand Central Dispatch is poised to finally spread the heretofore dark art of asynchronous application design to all Mac OS X developers. I can't wait.
The author is talking about a complete OS redesign, not a new threading system.
That's actually pretty good typing with your fists. Do you have a comically large keyboard?
I love how Microsoft can come along in 2010 and with a straight face say it's about time they took multiprocessing seriously. Or say it's about time we started putting HTML5 features into our browser. And we're finally going to support the ISO audio video standard from 2002. And by the way, it's about time we let you know that our answer to the 2007 iPhone will be shipping in 2011. And look how great it is that we just got 10% of our platform modernized off the 2001 XP version! And our office suite is just about ready to discover that the World Wide Web exists. It's like they are in a time warp.
I know they have product managers instead of product designers, and so have to crib design from the rest of the industry, necessitating them to be years behind, but on engineering stuff like multiprocessing, you expect them to at least have read the memo from Intel in 2005 about single cores not scaling and how the future was going to be 128 core chips before you know it.
I guess when you recognize that Windows Vista was really Windows 2003 and Windows 7 is really Windows 2005 then it makes some sense. It really is time for them to start taking multiprocessing seriously.
I am so glad I stopped using their products in 1999.
there is a option, at least as far back as xp that allows explorer windows to run as their own tasks. Why its not enabled by default i have no clue about (except that i have seen some issues with custom icons when doing so).
comment first, facts later. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm
so basically a big pile of C64 wired to a single keyboard and screen, via a BIG kvm switch?
comment first, facts later. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm
The part of the article where Probert discusses the operating system becoming something like a hypervisor reminds me of the Cache Kernel from a Stanford University paper back in 1994. http://www-dsg.stanford.edu/papers/cachekernel/main.html
The way I understand it, the cache kernel in kernel mode doesn't really have built-in policy for traditional OS tasks like scheduing or resource management. It just serves as a cache for loading and unloading for things like addresses spaces and threads and making them active. The policy for working with these things comes from separate application kernels in user mode and kernel objects that are loaded by the cache kernel.
There's also a 1997 MIT paper on exokernels (http://pdos.csail.mit.edu/papers/exo-sosp97/exo-sosp97.html). The idea is separating the responsibility of management from the responsibility of protection. The exokernel knows how to protect resources and the application knows how to make them sing. In the paper, they build a webserver on this architecture and it performs very well.
Both of these papers have research operating systems that demonstate specialized "native" applications running alongside unmodified UNIX applications running on UNIX emulators. That would suggest rebuilding an operating system in one of these styles wouldn't entail throwing out all the existing software or immediately forcing a new programming model on developers who aren't ready.
Microsoft used to talk about "personalities" in NT. It had subsystems for OS/2 1.x, WIn16, and Win32 that would allow apps from OS/2 (character mode), Windows 3.1 and Windows NT running as peers on top of the NT kernel. Perhaps someday the subsystems come back, some as OS personalities running traditional apps, and some as whole applications with resource management policy in their own right. Notepad might just run on the Win32 subsystem, but Photoshop might be interested in managing its own memory as well as disk space.
The mid-90s were fun for OS research, weren't they? :)
-- John Truong
If we want efficient code, we have to figure out ways to reward the programmers that write it. I don't see any sign that people anywhere are interested in doing this. Anyone have suggestions for how it might be done?
It's happening, from a source people didn't expect: portable devices. Battery life is becoming a primary feature of portable devices, and a large fraction of that comes from software efficiency. Take your average cell phone: it's probably got a half dozen cores running in it. One in the wifi, one in the baseband, maybe one doing voice codec, another doing audio decode, one (or more) doing video decode and/or 3d, and some others hiding away doing odds and ends.
The portable devices industry has been doing multi-core for ages. It's how your average cell phone manages immense power savings: you can power on/off those cores as necessary, switch their frequencies, and so on. They have engineers who understand how to do this. They're rewarded for getting it right: the reward is it lives on battery longer, and it's measurable.
Yes, you can get lazy and say 'next generation CPUs will be more efficient', but you'll be beaten by your competitors for battery life. Or, you fit a bigger battery and you lose in form factor.
The world is going mobile, and that'll be the push we need to get software efficient again.
For that matter when a copy or move fails in Explorer, why can't I simply resume it once I've fixed whatever the problem
Try TotalCopy which adds a copy/move in the right click menu; or Teracopy commercial (free version available, supports Win7) complete replacement for the sucky Windows copy system.
USB/Network freezes and file copying isn't a fault of CPU cores like you say, Windows is just a sucky OS. Multicore stuff gets complicated, but this isn't going to be a panacea for Microsoft, it's another marketing opportunity.
I'm getting really sick of posting this, but I'll continue to do so until you do.
BUILD A WORKING PROTOTYPE OF THIS "UNIVERSAL BEHAVING MACHINE", OR SHUT THE HELL UP.
Those of us who aren't insane aren't impressed by talk, we're impressed by results. If you spend half as much effort building the thing as you do flapping your damn jaw, you'd be done by now.
(For any uninitiated mods, this fellow is slashdot poster "rebelscience", and maintains a website of the same name. Every time a multiprocessing-related thread comes up, he posts this tripe but has never actually done anything about it. Visit his website, and you'll see why I call him a lunatic)
Well, I came here to see the fisting. And frankly, so far this site has been a real disappointment.
You have to read at -1 to see the goatse trolls.
I am becoming gerund, destroyer of verbs.
.NET apps DO use a virtual machine, the Common Language Runtime and support .NET IL. However, the Virtual Machine DOES use just-in-time compiling and precompiling to compile the code to native code before it runs it.
Same as any halfway decent desktop Java Virtual Machine implementation does now (mobile JVMs usually use hardware features like ARM Jazzelle to run the Java code faster)
It doesn't make it as different as you seem to think.
I think GCD is a great idea, and a very useful tool, but it's not a magic bullet. GCD can schedule some things more effectively because it has a system-wide view. The closure extensions and GCD interface makes it reasonably easy for novice programmers to get things actually running in parallel.
Of the two, the latter has a MUCH bigger impact in terms of actually getting programs to take advantage of multiple cores. Actually sending
BUT, it's nothing you can't do (and hasn't been done) with various multiprocessing libraries, many of which run on Windows, or with good old threads and processes if you've got a moderate level of skill. In order for it to work effectively the programmer still has to a) structure his program in such a way that the parallelism is exposed and b) actually use GCD.
Contrary to what you seem to suggest, GCD does not really "creates and manages threads on its own, even in applications that are not written to be threaded." It creates threads at the (indirect) request of the application and schedules them appropriately. The application MUST be designed to take advantage of multithreading. The only difference is that GCD makes it easier for the programmer to actually get those threads up and running, and can possibly schedule them more effectively.
Fixed in Vista and 7, you can ignore errors and continue copying.
The article in question talks about Winblows.
What about Linux?
Does it need retooling as well?
Muchas Gracias, Señor Edward Snowden !
I've always thought that both data flow languages and fortran95 had some innovations for multi-core programming worthy of being copied.
Data flow languages such as "G" which is sold as national instruments "labview" brand are intrinsically parallel at many levels. What they do is look at a function call as a list of unsatisfied inputs. These inputs are waiting for the data to arrive to make the variables valid. Then the subroutine fires. Thus every single function is potenitally a parallel process. it's just waiting on it's data. If you program in a serial fashion then of course those functions get called serially. But with graphic programming in 2D, you almost never are programming serially. You are just wiring outputs of other functions to inputs of others. Serial dependencies do arise but these are asynchronous and localized cliques. everything else is parallel. Yet you never ever ever actually write parallel code. it just happens automatically. Perl data language had a glimpse of this but it's not the same thing since the language is still perl and thus not parallel.
Objective-C with it's "message passing" abstraction is perhaps getting closer to the idea of a data flow. While one might complain that well objective-C message passing is just a different sugar coating of C just like C++ is. This would be true from the user's point of view. But it's not as true from the Operating system's point of view. IN OSX, these messages are passing more like actual socket programming at the kernel level. So there's more to objective C on apple's than meets the eye. But I don't know how far you can push that abstraction.
In fortran there are some rather simple but powerful multi-processor optimizations. First there's loops like "forall" that designate that a loop can be done in any order of the loop index and even in parallel. and then there's vectorized statements as part of the language like matrix multiplies. those are rather simple things so don't solve much but they do show that you can put a lot of compiler hinting into the language itself without re-inventing the language.
Some drink at the fountain of knowledge. Others just gargle.
You may not have to write your code around threading, but you then have to write it around grand central dispatch. Having GCD available is going to do absolutely nothing for a program that was not written with GCD in mind. Its changing one set of problems/features for another. Writing multi-threaded software isn't exceptionally hard. I have done a lot of it. It may take a lot less code with GCD, but you also give up control. Even using GCD with code blocks you still have to deal with the problems that can be a pain in the ass, things like concurrency, blocking and munging data.
First, the article in question talks about OS architecture, not Windows specifically. He specifically states that what he is speaking about is not something MS is working on. Quite the opposite, many of his MS colleagues disagree with him.
Second, the fundamental problems with OS design are exactly that: fundamental problems with OS design. Nobody is making an OS that truly takes advantage of multiple cores, it's still single-processor thinking with the ability to use more than one processor, and this leads to a number of inherent problems.
The article talks about what an OS might look like if built from scratch specifically for multiple core processing power, and there is nothing on the market like it at the moment. It's basically a hypervisor-based OS, where instead of giving programs slices of CPU time, the OS gives programs actual CPUs and slices of memory to use.
Something like that would be extremely slick, we already do that for virtual machines and we end up with 8+ full-fledged servers running on the same machine. Why can't you pull that back a little more so it's individual programs assigned to each CPU such that they don't have to interact with the OS at all once they are up and running? Can you imagine?
Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller
Apple's grand-central dispatch (GCD) solution is really primitive. It's just a simple thread-pool, where the programmer breaks their program down into tasks that can be executed independently then queues them for execution by the thread-pool.
GCD is not in the slightest innovative, except for a hack that allows "c" programmers to write tasks with slightly more convenience, by adding limited "closure" support to the language.
Similar concepts can be found all over the place; just see the "see also" section on the wikipedia article:
http://en.wikipedia.org/wiki/Grand_Central_Dispatch
Using any of the libs listed in that "see also" section, you can get GCD equivalent behaviour on unix/windows, and have been able to for years.
There are also languages with far superior parallel-processing abilities, where the effort is done by the compiler/environment, not the programmer. See any functional language, eg Haskell or Erlang. Write a program in these languages, and the parallel-processing happens just about automatically.
Adding parallelism to the *OS* is quite a different issue, and not one that Apple's GCD addresses.
Windows Explorer no longer kills network transfers after a failure as of Windows Vista.
Maybe some of the people complaining about Windows should stop using a version thats 9 years old (XP). Red Hat 7.2 isn't particularly great by today's standards either.
Please move along
I still cannot find the droids I am looking for...
What's wrong with at least some operating systems doesn't even have anything to do with multiple cores per se. They're simply designing the OS and its UI incorrectly, assigning the wrong priorities to events. No event should EVER supersede the ability of a user to interact and intercede with the operating system (and applications). Nothing should EVER happen to prevent a user being able to move the mouse, access the start menu, etc., yet this still happens in both Windows and Linux distributions. That's a fucked-up set of priorities, when the user sitting in front of the damned box - who probably paid for it - gets second billing when it comes to CPU cycles.
It doesn't matter if there's one CPU core or a hundred. It's the fundamental design priorities that are screwed up. Hell should freeze over before a user is denied the ability to interact, intercede, or override, regardless how many cores are present. Apparently hell has already frozen over and I just didn't get the memo?
Looks like Tanenbaum will have been right after all, I mean, a vast amount of cores and huge parallelism is the advent of the micro- and exokernel, isn't it? This would be the simplest way to harness the multiple cores (instead of modifying a monolithic kernel to use multiple cores)
I'd browse at -2 if I could.
English is not this
I am finding it very difficult to believe that you have actually used GCD. I have, and have read most of the code for the implementation. Creating threads is not hard - it is definitely not what makes parallel programming difficult. The difficult bit is splitting your code into parts that have no interdependencies and so can execute concurrently.
When you use libdispatch, you still have to do this. All that it does for you is implement an N:M threading model. It allocates a group of kernel threads and then multiplexes them into work queues. The pthread_workqueue_*_np() family of system calls lets the kernel decide the optimum number of kernel threads for the application, depending on system configuration and load. The libdispatch code then executes blocks on these threads. This saves some thread creation time and saves some context switching and cache churn because it runs blocks sequentially in a small number of threads (ideally one per core), rather than running them all concurrently on a separate thread.
It creates and manages threads on its own, even in applications that are not written to be threaded
No it doesn't. You must get a dispatch_queue_t and then send it blocks to execute concurrently. You must do this explicitly.
I am TheRaven on Soylent News
iPhone isn't even slightly "instant on" - it takes at least a minute to boot an iPhone from off. What you're seeing most of the time is "screen off" mode. Unsurprisingly, switching the screen on & cranking up the CPU clock doesn't take much time. Likewise, waking my Windows box up from sleep doesn't take very long either. Comparing modern OS software running on modern hardware I see little difference in boot times, or wake time from sleep - which would indicate that if MS are being lazy then so are Apple & all the devs in the Linux & BSD worlds. As for why my ST used to boot so much quicker, well the lack of discs helped, as did the lack of hardware variance (and thus lack of drivers to load & start).
---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"