Inside Intel's $20M Multicore Research Program
An anonymous reader writes "You may have heard about Intel's and Microsoft's efforts to finally get multi-core programming into gear so that there actually will be a developer who can program all those fancy new multicore processors, which may have dozens of core on one chip within a few years. TG Daily has an interesting article about the project, written by one of the researchers. It looks like there is a lot of excitement around the opportunity to create a new generation of development tools. Let's hope that we will soon see software that can exploit those 16+core babies. 'The problem of multi-core programming is staring at us right now. I am not sure what Intel's and Microsoft's expectations are, but it is quite possible that they are in fact looking at fundamental results from the academic centers to leverage their large work force to polish and realize the ideas that come forth. It calls for a much closer collaboration between the centers and the companies than it appears at first sight.'"
./configure --num-cores=16
The thing is, most PCs have plenty of computing power as a single core system. The hard sell is getting people to upgrade those machines mainly used for email and browsing and video playback. I think as time moves on and quad core becomes the "low-end" you will see less demand for higher end hardware. Unless the next version of Windows requires a core dedicated to the OS or something in the future.
“Common sense is not so common.” — Voltaire
Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
Software that will exploit 16+ cores already exists. The problem is, it is not consumer (home/office) software. There does not yet exist an application that people use that really needs multiple cores. Video encoding is getting there, but most people will never use it.
Fly me to the moon Let me sing among those stars Let me see what spring is like On jupiter and mars
Its only you peasants that persist in using old-hat Wintel stuff that are so last-year. Get with it people! You too could be runningNetBSD on your toaster (it will probably out perform Windows Vista in a 4-core Pentium anyway). Hell it might even eat Nandos peri-peri Vista for breakfast!
Sent from my ASR33 using ASCII
SMT processors of this type are only useful for accelerating a certain type of problem set, and useless for most general computing.
We've had SIMD multicore PC's forever, and they're useless as desktops. I write this from a quad xeon machine, repurposed as my dev box, as CPU1 grinds away at about 75% all day long, the rest idle. It's been like that for more than a decade, it'll be like that until MIMD hits the street with a whole new paradigm of programming languages behind it - a handful of C compiler #pragma directives from intel isn't going to make this work.
It's not simply a matter of "coders don't know how to do it." It's a matter of these multi-core "general purpose" CPUs are only really useful for a fairly limited set of specific problems.
Eg; writing a game engine with a video thread, audio thread and an input thread still leaves 13 cores idle. You really cant thread those much farther (the ridiculously parallel problem of rendering is handled by the GPU).
Simply starting processes on different procs doesn't help all that much, since they all fight over memory and I/O time. The point of diminishing returns is reached fairly quickly.
But hey, if all you do is run Folding@home so you can compare your e-cock with the other kids on hardextremeoverclockermegahackers.com, well I have some good news!
As for me, I'm seeing AMD's multiple specific purpose core approach as being more viable, as far as actually making my next desktop computer perform faster.
Savain says it best at rebelscience.org: "Even after decades of research and hundreds of millions of dollars spent on making multithreaded programming easier, threaded applications are still a pain in the ass to write."
I don't need no instructions to know how to rock!!!!
The structure of VHDL is inherently parallel as all processes (blocks of hardware) run at the same time. Only the code within the processes is evaluated sequentially (in most cases).
Although VHDL is a hardware description language, couldn't similar concepts be used to make a parallel centric computer programming language?
Although VHDL is a hardware description language, couldn't similar concepts be used to make a parallel centric computer programming language?
Excellent suggestion. This is precisely what the COSA software model is about. A pulsed neural network is my preferred metaphor for an ideal model of parallel computing. Intel and the others are on the verge of losing billions of dollars because they are already deeply committed to the hard to program multithreading model, a complete failure even after decades of research. To find out why multithreading is not part of the future of parallel programming, read Nightmare on Core Street.
Forget software not being written for multi-cores, the entire infrastructure around the computer needs to "go wide" for massive parallelism, not just the software. This includes disk, memory, front-side bus, etc./p>
I'm doing highly concurrent projects (grid computing) for my company and we're finding that some things parallelize just fine, but others simply move the pain and bottleneck to a piece of infrastructure that hasn't quite caught up yet.
For example, my laptop has a dual-core 2.2Ghz processor, which you'd think is great for development. It's no better than a single CPU machine because my disk IO light is on all the time. IntelliJ pounds the disk. Maven and Ant pound the disk. Outlook pounds the disk. Even surfing the web puts pages into disk cache, so browsing while building a project is slow. Until I get a SCSI drive, you're still limited on disk IO, so those extra cores don't help that much.
All the cores are great on the server, though. I've recently completed a massive integration project where I grid-enabled my company's enterprise apps. All those cores running grid nodes is giving us very high throughput. Our next bottleneck is the database (all those extra grid nodes pounding away at another bottleneck resource...)
Terracotta Server as a Message Bus. It's been a very interesting project.
You mean something like parallel_sort in libstdc++, since GCC 4.3.0?
One of several parallelised standard algorithms.
"Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"
Intel's been doing that (to some degree) since the Pentium, and they increased it a lot in the Pentium Pro/Pentium II. It works reasonably well up to a point (modern chips typically execute an average of two instructions per clock cycle) but definitely has limits.
Compilers to automatically detect when instructions can be executed in parallel have been around for years. Cray had vectorizing compilers by the late 1970's, and within rather specific limits, they worked perfectly well. Just for example, if you wrote a loop like:
they'd break the loop down into four actual executions of a loop, each of which worked on 64 items in parallel. It had independent execution units, so at a given time it'd normally be loading one set of 64 items into one set of registers, executing multiplications on a second set of 64 items, and storing results from a third set of 64 registers.
That has a couple of problems though. First of all, if you're not careful, it's pretty easy to create loops with (apparent) dependencies from one iteration to the next, so the compiler can't parallelize the code. Second, this works well for vector processors, but probably not nearly so well for a large number of completely independent processors (which have higher communication overhead, meaning that starting up things to happen in parallel is more expensive).
If you're willing to provide the compiler with a little help, it can do quite a bit more, such as with MPI. The standard MPI interface is pretty low-level, but if you want to do the job in C++, Boost.MPI helps out quite a bit (cheap plug: if you want to know more, consider attending Boostcon '08).
The universe is a figment of its own imagination.
How does Intel persuade people to buy new CPUs if there is no benefit delivered to the buyer?
How does Microsoft sell you new licenses if you don't buy a new computer?
Virtualization at the OS image level only allows you to run multiple different applications. Running more applications at once isn't the primary goal of the average user. They want the application which has the focus of their attention to be slick and fast.
Multicore CPUs do not allow you to run a single application faster. Intel's PC market and Microsoft's empire were built in a feedback loop based on the promise that you can buy a new machine every two years and your applications will run significantly faster. This held true until a few years ago when semiconductor technology hit the heat density wall on ramping up clock frequency. Now, and for the forseeable future, if you buy a new machine your single threaded application will run NO faster than it did on the old hardware.
That in a nutshell is the multicore problem. Most existing software is not written to exploit parallel processors. Most software developers cannot write a correct parallel code. The promise of "buy a new one, it is faster and better!" becomes a lie if the the software cannot exploit the extra cores.
No one has the solution to this in their pocket. Threads aren't the answer because they are a ridiculously hard to use correctly outside of very coarse grain contexts. Automatically parallelizing compilers have never delivered the goods in the general case. New languages face extremely slow adoption. The answer probably lies in languages, but the adoption problem is an extremely tough nut to crack. The recent successes here are Java (basically C++ with garbage collection) and Javascript+AJAX, which I don't think any heralds as a radical leap forward in language design.
I am involved in this research personally, so I'm not just pulling these assertions out of the air.
Outlook I can understand. It needs to flush the emails to disk before replying back to the server.
However, there's no reason why the web browser needs to ensure that the data hits the disk cache right away, so it should be just fine sitting in RAM until the disk frees up. Similarly, intellij, maven, and ant should be slow the first time but faster later on since they should be reading from the page cache.
There's no reason for your disk I/O light to be on unless you don't have enough RAM or the disk algorithm in windows blows chunks.
I do linux kernel development, and once I do an initial pass through the source tree the whole thing generally stays in RAM and I rarely have to hit the disk. I have 3GB of RAM, but this isn't excessive nowadays.
I've got a dual core machine sitting on the desk before me and the cpu rarely goes above 20% load. The strange thing though is it is still slow when loading programs and this is due to the hardisk (SATA II) being the bottle_neck on my system. I could fix this to some degree with a RAID setup but the real question is why isnt this being looked at more closely ?