Wintel, Universities Team On Parallel Programming

← Back to Stories (view on slashdot.org)

Wintel, Universities Team On Parallel Programming

Posted by kdawson on Friday March 14, 2008 @05:14AM from the gator-rays dept.

kamlapati writes in with a followup from the news last month that Microsoft and Intel are funding a laboratory for research into parallel computing at UC Berkeley. The new development is the imminent delivery of the FPGA-based Berkeley Emulation Engine version 3 (BEE3) that will allow researchers to emulate systems with up to 1,000 cores in order to explore approaches to parallel programming. A Microsoft researcher called BEE3 "a Swiss Army knife of computer research tools."

5 of 91 comments (clear)

Min score:

Reason:

Sort:

Linus Torvolds & Dave Patterson discuss it on by elwinc · 2008-03-14 05:26 · Score: 3, Interesting

Actually, this is old news. There's a month old discussion thread on RWT Discussion forum. Berkeley proposes the "thirteen dwarfs" - 13 kinds of test algorithms they consider valuable to parallelize. Linus doesn't think the 13 dwarfs correspond well to everyday computing loads. My 2 cents: Intel & others are spending hundreds of millions of bucks per year trying to speed up single-thread style computing, so it's not a bad idea to put a few more million/year into thousand thread computing.

--
--- Often in error; never in doubt!
Cheap Bastards. by cyc · 2008-03-14 06:19 · Score: 4, Interesting

Rick Merritt, who wrote the lead article also posted an opinion piece in EE Times lambasting Wintel for their lackluster funding efforts in parallel programming. I thoroughly agree with this guy. To quote:

Wintel should not just tease multiple researchers with a $10 million grant awarded to one institution. They need to significantly up the ante and fund multiple efforts.
Ten million is a drop in the bucket of the R&D budgets at Intel and Microsoft. You have to wonder about who is piloting the ship in Redmond these days when the company can afford a $44 billion bid for Yahoo to try to bolster its position in Web search but only spends $10 million to attack a needed breakthrough to save its core Windows business.
Use your GPU by TheSync · 2008-03-14 06:20 · Score: 4, Interesting

If you have a GeForce 8800 GT, you already have a 112 processor parallel computer that you can program using CUDA.
Re:"stuck with a ...serial programming model" by asills · 2008-03-14 06:28 · Score: 2, Interesting

Threads are harder just like memory management in C++ is harder than Java and .NET.

It's the people who really can't program that are having significant trouble with parallelization in modern applications. That's not to say that in the future I won't love to be able to express a solution and have it automatically parallelized, but for the time being creating applications that take advantage of multiple cores well (server apps, not client apps) is not that difficult if you know what you're doing.

Though, like C++ with memory leaking, it is possible to shoot yourself in the foot with a deadlock occasionally.

--
-- What did Spock find in Kirk's toilet? The captain's log.
Re:"stuck with a ...serial programming model" by 0xABADC0DA · 2008-03-14 06:55 · Score: 3, Interesting

Setting aside those problems which exhibit no parallelism (for whom there is no solution but a faster CPU really), there are many classes of problems which would benefit enormously from better programming models, which are more efficiently tied to the operating system and hardware rather than going through an OS level threading package. The programming models we have are just fine. By far the vast majority of program time is spent in a loop of some kind, but languages which could easily parallelize loops don't. There is no reason why 'foreach' or 'collect' cannot use other processors (whereas 'inorder' or 'inject' would always be sequential). So our programming models are not the problem. The real problem is trying to use them with a 40 year old operating system design.

Current operating system could run code in parallel if they stop scheduling threads a timeslice on a processor but instead schedule a timeslice across multiple processors. Take an array of 1000 strings and a regex to match them against. If the program is allocated 10 processors it can do a simple interrupt and have them start working on 100 strings each. By having the processors allocated can you avoid the overhead of switching memory spaces and of scheduling, making this kind of fine-grained parallelism feasible.

But the problem here is that most programs will use one or two processors most of the time and all the available processors at other times. And if your parallel operation had to synchronize at some point then you'd have all your other allocated processors doing nothing while waiting for one to finish with its current work. So there is a huge amount of wasted time by allocating a thread to more than one processor.

A solution to the unused processor problem is to have a single memory space, and so as a consequence only run typesafe code -- an operating system like JavaOS or Singularity or JXOS. This lets any processor be interrupted quickly to run any process's code in parallel, so CPU's can be dynamically assigned to different threads. Even small loops can be effectively run across many CPUs, and there is no waste from the heavyweight allocations and clunkiness that is caused ultimately by separate memory spaces needed to protect C-style programs from each other. This is why it is the operating system, not the programming models, that is the main problem.