Slashdot Mirror


Inside Intel's $20M Multicore Research Program

An anonymous reader writes "You may have heard about Intel's and Microsoft's efforts to finally get multi-core programming into gear so that there actually will be a developer who can program all those fancy new multicore processors, which may have dozens of core on one chip within a few years. TG Daily has an interesting article about the project, written by one of the researchers. It looks like there is a lot of excitement around the opportunity to create a new generation of development tools. Let's hope that we will soon see software that can exploit those 16+core babies. 'The problem of multi-core programming is staring at us right now. I am not sure what Intel's and Microsoft's expectations are, but it is quite possible that they are in fact looking at fundamental results from the academic centers to leverage their large work force to polish and realize the ideas that come forth. It calls for a much closer collaboration between the centers and the companies than it appears at first sight.'"

30 of 187 comments (clear)

  1. It's easy by Anonymous Coward · · Score: 5, Funny

    ./configure --num-cores=16

  2. Most PCs are fast enough by OrangeTide · · Score: 2, Insightful

    The thing is, most PCs have plenty of computing power as a single core system. The hard sell is getting people to upgrade those machines mainly used for email and browsing and video playback. I think as time moves on and quad core becomes the "low-end" you will see less demand for higher end hardware. Unless the next version of Windows requires a core dedicated to the OS or something in the future.

    --
    “Common sense is not so common.” — Voltaire
    1. Re:Most PCs are fast enough by pla · · Score: 5, Funny

      Unless the next version of Windows requires a core dedicated to the OS or something in the future.

      So, uh, you haven't Vista yet, I see...

    2. Re:Most PCs are fast enough by betterunixthanunix · · Score: 2, Insightful

      The software currently in use does not involve computationally complex problems, and so the computers appear to have "plenty of computational power." This is likely to be the case for a very long time, but there are useful but complex tasks computers might do. For example, a computer that might interact with its user purely by voice -- more advanced voice and language recognition systems are likely to require significantly more cores and computational power than is currently in wide use. Even more advanced might be a system that can interpret visual data, such as facial expressions. These systems are desirable, but need a lot of work, and won't be widely deployed for a long time (decades at least).

      --
      Palm trees and 8
    3. Re:Most PCs are fast enough by avandesande · · Score: 2, Insightful

      It's a valid point.... if the 'speed' of cars increased at the rate it did during the beginning of the century, we'd be driving 400mph cars around.
      We are certainly capable of making cars that are that fast, but they wouldn't really be any more useful or provide more utility than a slower car.

      --
      love is just extroverted narcissism
    4. Re:Most PCs are fast enough by KillerCow · · Score: 4, Insightful

      The thing is, most PCs have plenty of computing power as a single core system


      And 640k ought to be enough for anyone.

      I think as time moves on and quad core becomes the "low-end" you will see less demand for higher end hardware.


      My last purchase (6 to 8 months ago) was a "low-end" machine. I chose carefully to make sure that it was low-end and not bargain-basement. It has two cores. I don't think it's even possible to buy a single core machine through mainstream channels anymore. Today's low-end (multi-core) is more than adequate for most users to use over the next few (read: four) years.

      Unless the next version of Windows requires a core dedicated to the OS or something in the future.


      You do not understand how the scheduler works.
    5. Re:Most PCs are fast enough by peragrin · · Score: 3, Insightful

      Yes but voice processing is done best by dedicated hardware rather than generic. would a voice chip that can do that processing and only that processing be far more efficient? Call it the VPU, it can go next to the GPU, PPU. or it can be one of the 8 cores surrounding a cell processor. The trend that generic processors can do everything will end. maybe a plug and pray architecture where you can pick which cores you want installed on your system.

      --
      i thought once I was found, but it was only a dream.
    6. Re:Most PCs are fast enough by play_in_traffic · · Score: 2, Insightful

      The thing is, most PCs have plenty of computing power as a single core system. . . . Rather than multi core technology resulting in elegant new software to take advantage of it, I suspect that software will get worse (think loop until done, rather than schedule an interrupt). Faster processors have not made software, better rather they have resulted in an abundance of bad software!
    7. Re:Most PCs are fast enough by pete-classic · · Score: 2, Insightful

      No production car can even approach 400mph. Not even close. You'd be doing very well to spend half a million on a "production car" that can crack 400kmph.

      That leaves out the questions of range (you'd be lucky to get three miles to the gallon), and, you know, being able to actually maneuver on the public roads at that speed.

      You're nuts.

      -Peter

    8. Re:Most PCs are fast enough by OrangeTide · · Score: 2, Informative

      mpeg4 decompression is far more complex than voice recognition. The processing involved is simply not that great, even for "more advanced voice and language recognition". The difficulty lies in better algorithms to do it. Turns out dynamic voice control and interpretation is not something that can be brute forced.

      Game physics needs computational power. but I'm not considering game systems.

      Scientific and Engineering projects need computational power and benefit from cost reduction in high performance processing.

      The home user differs. I suspect dual core cpus is just a way for Intel to sell us twice as many cpus as we really need.

      --
      “Common sense is not so common.” — Voltaire
    9. Re:Most PCs are fast enough by PitaBred · · Score: 3, Insightful

      We've gotta get the bandwidth before 1080p is even remotely possible for video mail. The thing is, for the VAST majority of people, there is no killer app that will require an upgrade right now. A low-end machine will push 1080p in H.264 no problem. A 50MP picture of junior would again require more bandwidth, and a bigger monitor. Not a faster machine.

    10. Re:Most PCs are fast enough by OrangeTide · · Score: 3, Insightful

      And 640k ought to be enough for anyone. funny you quoted my response to that issue immediately after: "I think as time moves on and quad core becomes the "low-end" you will see less demand for higher end hardware."

      I don't think it's even possible to buy a single core machine through mainstream channels anymore. Conroe-L's are still shipping. And Intel has a single core ultra low power chip on the horizon designed to compete with ARM. Your phone, pda, heart monitor, etc won't be symmetric multiprocessor any time soon.

      You do not understand how the scheduler works. xbox 360 already works this way. three cores. 2 for the game, 1 for the OS.

      As a professional kernel developer, I realize that locking cores into specific tasks is a lot easier than writing a general purpose scheduler that performs equivalently.
      --
      “Common sense is not so common.” — Voltaire
    11. Re:Most PCs are fast enough by Belial6 · · Score: 3, Insightful

      "Get an apartment near your work."

      Terrible, terrible idea. Definitely not thought out.

    12. Re:Most PCs are fast enough by pjabardo · · Score: 3, Informative

      Actually the drag increases as the square of velocity: F = Cd.A.1/2.rho.V^2

  3. Will more cores help me decipher this run-on? by alta · · Score: 4, Funny

    I am not sure what Intel's and Microsoft's expectations are, but it is quite possible that they are in fact looking at fundamental results from the academic centers to leverage their large work force to polish and realize the ideas that come forth. Maybe my brain needs a new compiler. This must be a multi-core sentence.
    --
    Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
  4. Multicore Programs by Ironsides · · Score: 3, Insightful

    Software that will exploit 16+ cores already exists. The problem is, it is not consumer (home/office) software. There does not yet exist an application that people use that really needs multiple cores. Video encoding is getting there, but most people will never use it.

    --
    Fly me to the moon Let me sing among those stars Let me see what spring is like On jupiter and mars
  5. Sun? by Anne+Thwacks · · Score: 3, Funny
    Of course some of you will know that Sun have had 8/16/32 cores for quite a while, and that Solars, *BSD, and probably even Linux support this stuff just fine.

    Its only you peasants that persist in using old-hat Wintel stuff that are so last-year. Get with it people! You too could be runningNetBSD on your toaster (it will probably out perform Windows Vista in a 4-core Pentium anyway). Hell it might even eat Nandos peri-peri Vista for breakfast!

    --
    Sent from my ASR33 using ASCII
    1. Re:Sun? by GreggBz · · Score: 2, Informative

      Of course some of you will know that Sun have had 8/16/32 cores for quite a while, and that Solars, *BSD, and probably even Linux support this stuff just fine.


      The NT kernel has supported SMP for 10 years. So what?

      It's all about the applications. Sure, there's some development tools in *nix for multicore. I doubt they are efficient and accessible though. Can y'all tell me how great GCC is with 16 cores and thread level parallelism? I'm sure some academic and or low level solutions exist everywhere. However, it's undoubtedly a PITA whatever platform you work with. Everyone could use better tools for the future. Especially for making desktop apps.

  6. Show me the money Intel. by stratjakt · · Score: 5, Insightful

    SMT processors of this type are only useful for accelerating a certain type of problem set, and useless for most general computing.

    We've had SIMD multicore PC's forever, and they're useless as desktops. I write this from a quad xeon machine, repurposed as my dev box, as CPU1 grinds away at about 75% all day long, the rest idle. It's been like that for more than a decade, it'll be like that until MIMD hits the street with a whole new paradigm of programming languages behind it - a handful of C compiler #pragma directives from intel isn't going to make this work.

    It's not simply a matter of "coders don't know how to do it." It's a matter of these multi-core "general purpose" CPUs are only really useful for a fairly limited set of specific problems.

    Eg; writing a game engine with a video thread, audio thread and an input thread still leaves 13 cores idle. You really cant thread those much farther (the ridiculously parallel problem of rendering is handled by the GPU).

    Simply starting processes on different procs doesn't help all that much, since they all fight over memory and I/O time. The point of diminishing returns is reached fairly quickly.

    But hey, if all you do is run Folding@home so you can compare your e-cock with the other kids on hardextremeoverclockermegahackers.com, well I have some good news!

    As for me, I'm seeing AMD's multiple specific purpose core approach as being more viable, as far as actually making my next desktop computer perform faster.

    Savain says it best at rebelscience.org: "Even after decades of research and hundreds of millions of dollars spent on making multithreaded programming easier, threaded applications are still a pain in the ass to write."

    --
    I don't need no instructions to know how to rock!!!!
    1. Re:Show me the money Intel. by everphilski · · Score: 4, Insightful

      a handful of C compiler #pragma directives from intel isn't going to make this work.

      That's OpenMP, and depending on the program, it can work wonders. In an hour I parallelized 90% of a finite element CFD code with it. Yes, it sucks for fine-grained parallelization.

      Intel's product is Threaded Building Blocks, and is not built around pragmas, and is both commercial and OSS. It's pretty slick and will let you do the more fine-grained optimizations.

      It's a matter of these multi-core "general purpose" CPUs are only really useful for a fairly limited set of specific problems.

      Not entirely true, it's just useful for problems that need a processor.

      I write this from a quad xeon machine, repurposed as my dev box, as CPU1 grinds away at about 75% all day long, the rest idle.

      ... obviously, you have more processor than you need. I, on the other hand, have a quad core Opteron that is currently over 350% utilization. I tank it almost 24/7.

      the ridiculously parallel problem of rendering is handled by the GPU

      Not for long. Raytracing is making a comeback.

      As for me, I'm seeing AMD's multiple specific purpose core approach as being more viable, as far as actually making my next desktop computer perform faster.

      If you can't even tank one core of your Xenon, it's doubtful.

      "Even after decades of research and hundreds of millions of dollars spent on making multithreaded programming easier, threaded applications are still a pain in the ass to write."

      I'd caveat that by saying "threading arbitrary program X is a pain in the ass." There are plenty of useful programs that are easily parallelized.

    2. Re:Show me the money Intel. by Rhys · · Score: 2, Interesting

      The desktop PC should be idle most of the time. User input is really slow and in general the machine is waiting on the user, not the other way around. However, ask yourself who's time is more valuable, the machine you bought for $1,500 that lasts 3 years (at least, that's hardware update cycle around my work), or the person you pay $150,000 over a similar time frame? (give or take on location, entry-level position) Pay 10% more ($150) for the computer to save the person 0.1% ($150) of their time? That's an even trade at least. That 0.1% of the person's time, by the way, is 28.8 seconds per (8-hour) workday.

      How often has a site locked up your web browser? How much time do you spend waiting on a on-boot virus scanner (memory, boot sector, enable on-access-scan) to run against your machine? I'm not bothered when beagle fires up an auto-index on a multi-core machine. I never notice the performance hit of it. How long does an entry-level developer spend kicking their shoes back while a compile runs (trivial to parallelize to some degree)?

      Speaking of the developer, if he's writing games, there better be each of the 13 other available cores busy running AIs. The more cycles you can throw at them, the better they can play without blatant cheating. Some games get an exception to this (mostly online MMOs, but also puzzle/etc games) but even there there can be useful things to do. How about voice chat software (MMOs) that does open-mic-feedback analysis and automatically filters out anything it is sending to the speakers?

      Just IMHO. No, desktops can't really currently make good use of a 8+ core machine. The jury is out on quad cores at the moment, but dual-cores are a performance boon over singles.

      --
      Slashdot Patriotism: We Support our Dupes!
  7. Hardware description to parallel programming lang? by Anonymous Coward · · Score: 3, Interesting

    The structure of VHDL is inherently parallel as all processes (blocks of hardware) run at the same time. Only the code within the processes is evaluated sequentially (in most cases).

    Although VHDL is a hardware description language, couldn't similar concepts be used to make a parallel centric computer programming language?

  8. Re:Hardware description to parallel programming la by MOBE2001 · · Score: 2, Interesting

    Although VHDL is a hardware description language, couldn't similar concepts be used to make a parallel centric computer programming language?

    Excellent suggestion. This is precisely what the COSA software model is about. A pulsed neural network is my preferred metaphor for an ideal model of parallel computing. Intel and the others are on the verge of losing billions of dollars because they are already deeply committed to the hard to program multithreading model, a complete failure even after decades of research. To find out why multithreading is not part of the future of parallel programming, read Nightmare on Core Street.

  9. Moving the bottleneck... by MarkEst1973 · · Score: 4, Interesting

    Forget software not being written for multi-cores, the entire infrastructure around the computer needs to "go wide" for massive parallelism, not just the software. This includes disk, memory, front-side bus, etc./p>

    I'm doing highly concurrent projects (grid computing) for my company and we're finding that some things parallelize just fine, but others simply move the pain and bottleneck to a piece of infrastructure that hasn't quite caught up yet.

    For example, my laptop has a dual-core 2.2Ghz processor, which you'd think is great for development. It's no better than a single CPU machine because my disk IO light is on all the time. IntelliJ pounds the disk. Maven and Ant pound the disk. Outlook pounds the disk. Even surfing the web puts pages into disk cache, so browsing while building a project is slow. Until I get a SCSI drive, you're still limited on disk IO, so those extra cores don't help that much.

    All the cores are great on the server, though. I've recently completed a massive integration project where I grid-enabled my company's enterprise apps. All those cores running grid nodes is giving us very high throughput. Our next bottleneck is the database (all those extra grid nodes pounding away at another bottleneck resource...)

    Terracotta Server as a Message Bus. It's been a very interesting project.

  10. Re:Multi-threaded qsort() anyone? by Yokaze · · Score: 3, Informative

    You mean something like parallel_sort in libstdc++, since GCC 4.3.0?

    One of several parallelised standard algorithms.

    --
    "Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"
  11. Re:stupid much? by Jerry+Coffin · · Score: 4, Informative

    Instead of trying to convince everyone on Earth to change all existing software, why doesn't Microsoft just make the next version of Windows have a process handler that can process single threads on multiple cores at once? Actually technically I think Intel could do that internally on their processors too sort of like RAID for cores.


    Intel's been doing that (to some degree) since the Pentium, and they increased it a lot in the Pentium Pro/Pentium II. It works reasonably well up to a point (modern chips typically execute an average of two instructions per clock cycle) but definitely has limits.

    Compilers to automatically detect when instructions can be executed in parallel have been around for years. Cray had vectorizing compilers by the late 1970's, and within rather specific limits, they worked perfectly well. Just for example, if you wrote a loop like:

    for (int i=0; i<256; i++)
    a[i] = b[i] * c[i];

    they'd break the loop down into four actual executions of a loop, each of which worked on 64 items in parallel. It had independent execution units, so at a given time it'd normally be loading one set of 64 items into one set of registers, executing multiplications on a second set of 64 items, and storing results from a third set of 64 registers.

    That has a couple of problems though. First of all, if you're not careful, it's pretty easy to create loops with (apparent) dependencies from one iteration to the next, so the compiler can't parallelize the code. Second, this works well for vector processors, but probably not nearly so well for a large number of completely independent processors (which have higher communication overhead, meaning that starting up things to happen in parallel is more expensive).

    If you're willing to provide the compiler with a little help, it can do quite a bit more, such as with MPI. The standard MPI interface is pretty low-level, but if you want to do the job in C++, Boost.MPI helps out quite a bit (cheap plug: if you want to know more, consider attending Boostcon '08).
    --
    The universe is a figment of its own imagination.
  12. How to deliver ever improving performance? by gothmogged · · Score: 2, Interesting

    How does Intel persuade people to buy new CPUs if there is no benefit delivered to the buyer?

    How does Microsoft sell you new licenses if you don't buy a new computer?

    Virtualization at the OS image level only allows you to run multiple different applications. Running more applications at once isn't the primary goal of the average user. They want the application which has the focus of their attention to be slick and fast.

    Multicore CPUs do not allow you to run a single application faster. Intel's PC market and Microsoft's empire were built in a feedback loop based on the promise that you can buy a new machine every two years and your applications will run significantly faster. This held true until a few years ago when semiconductor technology hit the heat density wall on ramping up clock frequency. Now, and for the forseeable future, if you buy a new machine your single threaded application will run NO faster than it did on the old hardware.

    That in a nutshell is the multicore problem. Most existing software is not written to exploit parallel processors. Most software developers cannot write a correct parallel code. The promise of "buy a new one, it is faster and better!" becomes a lie if the the software cannot exploit the extra cores.

    No one has the solution to this in their pocket. Threads aren't the answer because they are a ridiculously hard to use correctly outside of very coarse grain contexts. Automatically parallelizing compilers have never delivered the goods in the general case. New languages face extremely slow adoption. The answer probably lies in languages, but the adoption problem is an extremely tough nut to crack. The recent successes here are Java (basically C++ with garbage collection) and Javascript+AJAX, which I don't think any heralds as a radical leap forward in language design.

    I am involved in this research personally, so I'm not just pulling these assertions out of the air.

  13. why so much disk I/O? by Chirs · · Score: 2, Informative

    Outlook I can understand. It needs to flush the emails to disk before replying back to the server.

    However, there's no reason why the web browser needs to ensure that the data hits the disk cache right away, so it should be just fine sitting in RAM until the disk frees up. Similarly, intellij, maven, and ant should be slow the first time but faster later on since they should be reading from the page cache.

    There's no reason for your disk I/O light to be on unless you don't have enough RAM or the disk algorithm in windows blows chunks.

    I do linux kernel development, and once I do an initial pass through the source tree the whole thing generally stays in RAM and I rarely have to hit the disk. I have 3GB of RAM, but this isn't excessive nowadays.

  14. Faster CPU's are not the problem by EEPROMS · · Score: 2, Insightful

    I've got a dual core machine sitting on the desk before me and the cpu rarely goes above 20% load. The strange thing though is it is still slow when loading programs and this is due to the hardisk (SATA II) being the bottle_neck on my system. I could fix this to some degree with a RAID setup but the real question is why isnt this being looked at more closely ?

  15. CPU is not bottleneck on desktop by ToasterMonkey · · Score: 2, Interesting

    People need to stop thinking that 'I don't have a program that uses 16 cores (16 real threads), so I don't need a 16 core system).' On a desktop PC, the IO system is going to be the source of contention a far more often than the processor(s). How often do most people run several CPU bound tasks simultaneously on a desktop anyway? Extremely rarely.

    Imagine splitting the CPU cycles of 1 core for all these tasks, and sharing them fairly, against splitting the cycles of 2..4..16 cores. If the CPUs you currently have aren't being heavily utilized, then having more of them isn't going to give you any perceptible improvements. This is really a matter of scaling horizontally as opposed to vertically, and they both suit entirely different workloads. The average workload of a desktop PC is shifting slowly in one direction, and not much at all in the other.