Windows and Linux Not Well Prepared For Multicore Chips
Mike Chapman points out this InfoWorld article, according to which you shouldn't immediately expect much in the way of performance gains from Windows 7 (or Linux) from eight-core chips that come out from Intel this year. "For systems going beyond quad-core chips, the performance may actually drop beyond quad-core chips. Why? Windows and Linux aren't designed for PCs beyond quad-core chips, and programmers are to blame for that. Developers still write programs for single-core chips and need the tools necessary to break up tasks over multiple cores. Problem? The development tools aren't available and research is only starting."
So basically yet another tech writer finds out that a huge number of applications are still single threaded, and that it will be a while before we have applications that can take advantage of the cores that the OS isn't actively using at the moment. Well, assuming you're running a desktop and not a server.
This isn't a performance issue with regards to Windows or Linux, they're quite adept at handling multiple cores. They just don't need that much themselves and the applications run these days, individually, don't need much more than that either.
So yes, applications need parallelization. The tools for it are rudimentary at best. We know this. Nothing to see here.
"The development tools aren't available and research is only starting"
Hardly. Erlang's been around 20 years. Newer languages like Scala, Clojure, and F# all have strong concurrency. Haskell has had a lot of recent effort in concurrency (www.haskell.org/~simonmar/papers/multicore-ghc.pdf).
If you prefer books there's: Patterns for Parallel Programming, the Art of Multiprocessor Programming, and Java Concurrency in Practice, to name a few.
All of these are available now, and some have been available for years.
The problem isn't that tools aren't available, it's that the programmers aren't preparing themselves and haven't embraced the right tools.
The quote presented in the summary is nowhere to be found in the linked article. To make matters worse, the summary claims that linux and windows aren't designed for multicore computers but the linked article only claims that some applications are not designed to be multi-threaded or running multiple processes. Well, who said that every application under the sun must be heavily multi-threaded or spawning multiple processes? Where's the need for a email client to spawn 8 or 16 threads? Will my address book be any better if it spans a bunch of processes?
The article is bad and timothy should feel bad. Why is he still responsible for any news being posted on slashdot?
In fact, TFA doesn't even use the words "Linux" or "Windows."
What good are multiple cores and threads when you are running event driven GUI application?
Mozilla Firefox is an event-driven GUI application. But if I open a page in a new tab, a big reflow or JavaScript run in that page can freeze the page I'm looking at. You can see this yourself: open this page in multiple tabs, and then try to scroll the foreground page. If Firefox used a thread or process per page like Google Chrome does, the operating system would take care of this. Other applications need to spawn threads when calling an API that blocks, such as gethostbyname() or getaddrinfo(); otherwise, the part of the program that interacts with the user will freeze. But these are the kind of threads that are useful even on a single core, not multicore-specific optimizations.
I think this problem will take longer than a year or two to solve. Modern computers are really fast. They solve simple problems, almost instantly. A side-effect of this, is that if you underestimate the computational power required for the problem at hand, then you are likely to be off by large amounts.
If you implement an order n-squared algorithm, O(n^2), on a 6502 (Apple II), if n was larger than a few hundred, you were dead. Many programmers wouldn't even try implementing hard algorithms on the early Apple II computers. On the other hand, a modern processor might tolerate O(n^2) algorithms with n larger than 1000. Programmers can try solving much harder problems. However, the programmers ability to estimate and deal with computational complexity has not changed since the early days of computers. Programmers use generalities. They use ranges: like n will be between 5 and 100, or n will be between 1000 and 100,000. With modern problems, n=1000 might mean the problem can be solved on a netbook, and n=100,000 might require a small multi-core cluster.
There aren't many programming platforms out there that scale smoothly between applications deployed on a desktop, to applications deployed on a multi-core desktop, and then to clusters of multi-core desktops. Perhaps most worrying, is that the new programming languages that are coming out, are not particularly useful for intense data analysis. The big examples of this for me are: .NET and Functional Languages. .NET deployed at about the same time multi-core chips showed up, and has minimal support for it. Functional languages may eventually be the solution, but for any numerically intensive application, tight loops of C code are much faster.
The other issue with multi-core chips, is that as a programmer, I have two solutions to making my code go faster: .NET, or many functional languages, because they use run-time compilers/interpreters and don't generate assembly code.
1. Get out the assembly print outs and the profiler, and figure out why the processor is running slow. Doing this, helps every user of the application, and works well with almost any of the serious compiled languages (C, C++). Sometimes, I can get a 10:1 speed improvement.(*) It doesn't work so well with Java,
2. I recode for a cluster. Why stop at a multi-core computer? If I can get a 2:1 to 10:1 speed up by writing better code, then why stop at a dual or quad core? The application might require a 100:1 speed up, and that means more computers. If I have a really nasty problem, chances are that 100 cores are required, not just 2 or 8. Multi-core processors are nice, because they reduce cluster size and cost, but a cluster will likely be required.
The problem with both of the above approaches, is that from a tools perspective, they are the worst choice for multi-core optimizations. Approach 1 will force me into using C and C++, which doesn't even handle threads really well. In particular, C and C++ lacks an easy implementation of Software Transactional Memory, NUMA, and clusters. This means that approach 2 may require a complete software redesign, and possibly either a language change or a major change in the compilation environment. Either way, my days of fun loving Java and .NET code are coming to a sudden end.
I just don't think there is any easy way around it. The tools aren't yet available for easy implementation of fast code that scales between the single-core assumption and the multi-core assumption in a smooth manner.
Note: * - By default, many programmers don't take advantage of many features that may increase the speed of an algorithm. Built-in special purpose libraries, like MMX, can dramatically speed up certain loops. Sometimes loops contain a great deal of code that can be eliminated. Maybe a function call is present in a tight loop. Anti-virus software can dramatically affect system speed. Many little things can sometimes make big differences.
To dumb your message down, CPU manufacturers act like book publishers who want you to read one book in two different places at the same time just because you happen to have two eyes. But a story can't be read this way, and for the same reason most programs don't benefit from several CPU cores. Books are read page by page because each little bit of story depends on previous story; buildings are constructed one floor at a time because each new floor of a building sits on top of lower floors; a game renders one map at a time because it's pointless to render other maps until the player made his gameplay decisions and arrived there.
In this particular case CPU manufacturers do what they do simply because that's the only thing they know how to do. We, as users, for most tasks would rather prefer a single 1 THz CPU core, but we can't have that yet.
There are engineering and scientific tasks that can be easily subdivided - this comes to mind - and these are very CPU-intensive tasks. They will benefit from as many cores as you can scare up. But most computing in the world is done using single-threaded processes which start somewhere and go ahead step by step, without much gain from multiple cores.
This is the sort of thing I like about Apple's 'Grand Central'. The idea behind is that instead of assigning a task to a processor, it breaks up a task into discrete compute units that can be assigned wherever. When doing processing in a loop, for example, if each iteration is independent, you could make each iteration a separate 'unit', like a packet of computation.
The end result is that the system can then more efficiently dole out these 'packets' without the programmer having to know about the target machine or vice-versa. For some computation, you could use all manner of different hardware - two dual-core CPUs and your programmable GPU, for example - because again, you don't need to know what it's running on. The system routes computation packets to wherever they can go, and then receives the results.
Instead of looking at a program as a series of discrete threads, each representing a concurrent task, it breaks up a program's computation into discrete chunks, and manages them accordingly. Some might have a higher priority and thus get processed first (think QoS in networking), without having to prioritize or deprioritize an entire process. If a specific packet needs to wait on I/O, then it can be put on hold until the I/O is done, and the CPU can be put back to work on another packet in the meantime.
What you get in the end is a far more granular, more practical way of thinking about computation that would scale far better as the number of processing units and tasks increases.
Fork isn't slow or painful. And if you think shared memory is a bad way to communicate, you REALLY won't like threads.
Finally! A year of moderation! Ready for 2019?
You're thinking too simply. A single-core system at 5GHz would be less-responsive for most users than a dual-core 2GHz. Here's why:
While you're playing a game more programs are running in the background - anti-virus, defrag, email, google desktop, etc. Also, any proper, modern game splits it's tasks, e.g. game AI, physics, etc.
So dual-core is definitely a huge step up from single. So, no, users don't want single-core, they want a faster more responsive pc, which NOW is dual-core. In a few years it will be quad core. Most now hardly benefit from quad core.
But most computing in the world is done using single-threaded processes which start somewhere and go ahead step by step, without much gain from multiple cores.
Yeah, I agree. There are a few rare types of software that are naturally parallel or deal with concurrency out of necessity, such as GUI applications, server applications, data-crunching jobs, and device drivers, but basically every other kind of software is naturally single-threaded.
Wait....
Sarcasm aside, few computations are naturally parallelizable, but desktop and server applications carry out many computations that can be run concurrently. For a long time it was normal (and usually harmless) to serialize them, but these days it's a waste of hardware. In a complex GUI application, for example, it's probably fine to use single-threaded serial algorithms to sort tables, load graphics, parse data, and check for updates, but you had better make sure those jobs can run in parallel, or the user will be twiddling his thumbs waiting for a table to be sorted while his quad-core CPU is "pegged" at 25% crunching on a different dataset. Or worse: he sits waiting for a table to be sorted while his CPU is at 0% because the application is trying to download data from a server.
Your example of building construction is actually a good example in favor of concurrency. Construction is like a complex computation made of smaller computations that have complicated interdependencies. A bunch of different teams (like cores) work on the building at the same time. While one set of workers is assembling steel into the frame, another set of workers is delivering more steel for them to use. Can you imagine how long it would take if these tasks weren't concurrent? Of course, you have to be very careful in coordinating them. You can't have the construction site filled up with raw materials that you don't need yet, and you don't want the delivery drivers sitting idle while the construction workers are waiting for girders. I'm sure the complete problem is complex beyond my imagination. By what point during construction do need your gas, electric, and sewage permits? Will it cause a logistical clusterfuck (contention) if there are plumbers and eletricians working on the same floor at the same time? And so on ad infinitum. Yet the complexity and inevitable waste (people showing up for work that can't be done yet, for example) is well worth having a building up in months instead of years.
So the chip companies are generally going to end up spending process improvements by making chips cheaper, rather than more complex?
Probably. Cheaper, and less power-hungry. For the past 50 years we've had a set of cycles where computers get dedicated hardware for some task, then the general purpose hardware gets fast enough to run it and the dedicated hardware goes away, then the cycle repeats with some other algorithm (sound, 2D video, and so on). The side-effect of this is that it also consumes a lot more power. For any algorithm, you can design dedicated hardware that executes it with less power than a general-purpose CPU. The DSP on something like an OMAP3 can decode MP3 audio in under 50mW; even something like the Atom is going to struggle to get within two orders of magnitude of this.
This wasn't a problem for desktop PCs, because they were plugged into the mains and no one has itemised electricity bills, so no one notices the difference between a 20W and a 100W CPU. In a laptop or palmtop, the difference between 250mW (a typical ARM Cortex A8 SoC) and 20W (Atom + a cheap chipset) can be several hours of battery life. People are starting to expect 10 hours of battery life from portables, and doing this with a small battery requires a lot of dedicated silicon that can be turned off when not in use and draw small amounts of power when executing the task it was designed for.
I expect the future of CPUs will be heterogeneous multicore. In a way, that's the present of CPUs too; you can consider the FPU and vector unit as separate, specialised, cores (although they lack separate control instructions, so it's stretching it slightly).
I am TheRaven on Soylent News
It's posts like these that make me think that I'm the only one with 7 programs on the task bar, 12 in the system tray, assorted server processes, and 32 tabs open in Firefox (come on, 1 thread per tab!!). It doesn't much matter to me if each of these parts are not multithreaded, as long as the OS is smart enough to put active threads on different cores.
This is the sort of thing I like about Apple's 'Grand Central'.
What's this 'grand central' thing? From a few brief Google searches it appears to be a framework for using graphics shaders to offload number crunching to the video card. It'd be nice if they'd stick (at least for technical audiences) to slightly more descriptive and less grandiose labels.
<rant>
That's always been my main peeve with Apple, they give opaque, grandiloquent names to standard technologies, make ridiculous performance claims, then set their foaming fanboys loose to harass those of us who just want to get the job done. Remember "AltiVEC" (which my friend swore could burn a picture of Jesus's toenails onto a piece of toast on the far side of the moon with a laser beam comprised purely of blindingly fast array calculations) which turned out to just be a slightly better MMX-like SIMD addon?
Or the G3/G4 processors which lead us to be breathlessly sprayed with superlatives for years until Apple ditched them for the next big thing - Intel processors! Us stupid, drone-like "windoze" users would never see the genius in using Intel proce... oh wait. No, no wait. We got the same "oooh the Intel Mac is 157 times faster than an Intel PC" for at least six months until 'homebrew' OSX finally proved that the hardware is exactly the friggin same now. For a while, thank God, they've been reduced to lavishing praise on the case design and elegant headphone plug placement. It looks like that's coming to an end, though.
</rant>
Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.