Windows and Linux Not Well Prepared For Multicore Chips
Mike Chapman points out this InfoWorld article, according to which you shouldn't immediately expect much in the way of performance gains from Windows 7 (or Linux) from eight-core chips that come out from Intel this year. "For systems going beyond quad-core chips, the performance may actually drop beyond quad-core chips. Why? Windows and Linux aren't designed for PCs beyond quad-core chips, and programmers are to blame for that. Developers still write programs for single-core chips and need the tools necessary to break up tasks over multiple cores. Problem? The development tools aren't available and research is only starting."
Multiple virtual machines on the same piece of metal, with a workstation hypervisor, and intelligent balancing of apps between backends.
Multiple OSes sharing the same cores. Multiple apps running on the different OSes, and working together.
Which can also be used to provide fault tolerance... if one of the worker apps fails, or even one of the OSes fails, your processor capability is reduced, a worker app in a different OS takes over, use checkpointing procedures, and shared state, so the apps don't even lose data.
You should even be able to shutdown a virtual OS for windows updates without impact, if the apps that arise get designed properly...
Too bad BeOS died. One of the axioms the developers had was 'the machine is a multi processor machine', and everything was built to support that.
Seems like they were 15 years ahead of their time. But, on the other hand, too late to establish an other OS in a saturated market. Pity, really.
The onus may ultimately lie with developers to bridge the gap between hardware and software to write better parallel programs......They should open up data sheets and study chip architectures to understand how their code can perform better, he said.
Here's the problem, most programs spend 99% of its time waiting. MOST of that is waiting for user input. Part of it is waiting for disk access (as mentioned in the AnandTech story, the best thing you can do to speed up your computer is get a faster hard drive/SSD). A miniscule part of it is spent in the processor. If you don't believe me, pull out a profiler and run it on one of your programs, it will show you where things can be easily sped up.
Now, given that the performance of most programs is not processor bound, what is there to gain by parallelizing your program? If the performance gain were really that significant, I would already be writing my program with threads, even with the tools we have now. The fact of the matter is in most cases, there is really no point to writing your program in a parallel manner. This is something a lot of the proponents of Haskell don't seem to understand, that even if their program is easily paralellizable, the performance gain is not likely to be noticeable. Speeding up hard drives will make more of a difference to performance in most cases than adding cores.
I for one am certainly not going to be reading chip data sheets unless there's some real performance benefit to be found. If there's enough benefit, I may even write parts in assembly, I can handle any ugliness. But only if there's a benefit from doing so.
Qxe4
No, it's not about adaptation. The whole approach currently taken is completely, outright on-its-head wrong.
To begin with, I don't believe the article about the systems being badly prepared. I can't speak for Windows, but I know for sure that Linux is capable of far heavier SMP operation than 4 CPUs.
But more importantly, many programming tasks simply aren't meaningful to break up into such units of granularity is OS-level threads. Many programs would benefit from being able to run just some small operations (like iterations of a loop) in parallel, but just the synchronization work required to wake up even a thread from a pool to do such a thing would greatly exceed the benefit of it.
People just think about this the wrong way. Let me re-present the problem for you: CPU manufacturers have been finding it harder to scale the clock frequencies of CPUs higher, and therefore they start adding more functional units to CPUs to do more work per cycle instead. Since the normal OoO parallelization mechanisms don't scale well enough (probably for the same reasons people couldn't get data-flow architectures working at large scales back in the 80's), they add more cores instead.
The problem this gives rise to, as I stated above, is that the unit of parallelism gained by more CPUs is to large to divide the very small units of work that exist among. What is needed, I would argue, is a way to parallelize instructions in the instruction set itself. HP's/Intel's EPIC idea (which is now Itanium) wasn't stupid, but it has a hard limitation on how far it scales (currently four instructions simultaneously).
I don't have a final solution quite yet (though I am working on it as a thought project), but the problem we need to solve is getting a new instruction set which is inherently capable of parallel operation, not on adding more cores and pushing the responsibility onto the programmers for multi-threading their programs. This is the kind of the the compiler could do just fine (even the compilers that exist currently -- GCC's SSA representation of programs, for example, is excellent for these kinds of things), by isolating parts of the code in which there are no dependencies in the data-flow, and which could therefore run in parallel, but they need the support in the instruction set to be able to specify such things.
Perl has excellent support for building threaded applications. See http://perldoc.perl.org/threads.html . I code multi-threaded apps in perl all the time and they utilize my quad-code very efficiently - in fact, my biggest hassle with multithreading is keeping the CPU cooled! There's also a threads::shared module (http://perldoc.perl.org/threads/shared.html) for handling locks, etc. I'd be hard pressed to imagine better language support for threading. Hardware, operating systems, and a lot of languages support threading. Granted, it isn't always easy/possible/worth it, but as things currently stand, the only bottleneck is programmers who are too lazy to design their algorithms for parallel execution.