MIT Randomizes Tasks To Speed Massive Multicore Processors

← Back to Stories (view on slashdot.org)

MIT Randomizes Tasks To Speed Massive Multicore Processors

Posted by samzenpus on Monday February 2, 2015 @01:05AM from the greased-lightning dept.

itwbennett writes Researchers at the Massachusetts Institute of Technology have created a data structure that they claim can help large multicore processors churn through their workloads more effectively. Their trick? Do away with the traditional first-come, first-served work queue and assign tasks more randomly. The SprayList algorithm allows processors with many cores to spread out their work so they don't stumble over one another, creating bottlenecks that hamper performance.

5 of 63 comments (clear)

Min score:

Reason:

Sort:

Oooh I have an idea by Anonymous Coward · 2015-02-02 01:19 · Score: 5, Funny

3D print it too, that always helps!
Re:I don't know enough about this stuff by mwvdlee · 2015-02-02 01:30 · Score: 3, Insightful

Results don't come in out-of-order. Imagine two variables, A and B, each undergoes a number of calculation steps which don't refer to the other variable. I.e. A=A+5/2*13-29 and B=B*B*3+12/N, then finally adding them together as Z=A+B. Normal execution would first do all the calculations for A then all the calculations for B, then finally Z. Out-of-order execution would calculate both A and B simultaneously, wait for both to finish, then calculate Z. Out-of-order execution involves a lot of this type of waiting, but since it's waiting for just the slowest calculation instead of the sum of both the slowest and fastest calculation it ends up being done sooner. If things cannot be calculated like this, an out-of-order capable processor will simply do things in-order.
At least that's how I understand it at a very abstract level.

--
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Avoiding bottlenecks by Zorpheus · 2015-02-02 01:38 · Score: 3, Interesting

The article says that the SprayList algorithm is faster for many cores than a traditional priority queue, since there are collisions when several cores ask for the top priority task at once.
Couldn't you just distribute the tasks ahead of time, giving every core a new task before its current task is finished?
Also, the article syas:

Random assignment has traditionally been frowned upon by those who think a lot about computer processors, the researchers noted in a paper explaining the work. A random scheduling algorithm takes longer to jump around the queue than a conventional one does. Caches can't be used to store upcoming work items. And if a set of tasks needed to perform one job are executed out of order, then the computer needs additional time to reassemble the final results.
I would think these problems are the same for the priority queue that they compare performance to. And I guess there are other ways which avoid these problems, which might produce faster results.
Re:I don't know enough about this stuff by fey000 · 2015-02-02 02:04 · Score: 3, Informative

If I'm not mistaken, you are thinking about branch prediction, not out-of-order-executions in an otherwise serial pipe.
To elaborate, OOE deals with computing as much as possible without having to wait for a result first.
Branch prediction is a cache separate from the execution tray that attempts to predict the outcome of an if/switch or other branching evaluation and then load the pipelines to the execution tray with the computations following that branching, since the time it takes to evaluate an if/switch can be long, and without a prediction the cpu would have to stall until the evaluation is complete.
Re:I don't know enough about this stuff by gnupun · 2015-02-02 04:47 · Score: 3, Interesting

It always seemed wrong that out of order execution was more efficient than in order.
Dependency between instructions can temporarily delay execution of some instructions. This can be easily explained with an example:
1) A = A + 1 2) B = A + 2 3) C = D + 3
Above is pseudo code of 3 instructions being executed by a CPU. While instr #1 is being executed, the CPU will also try to execute instr #2, but since instr #2 needs the current value of register A, the CPU waits until instr #1 is complete and value of A can be used to calculate B. This is why in-order CPUs can be simpler to create, but be slightly slow.
Out-of-order CPUs work around this problem by deferring instr #2. Instead, the CPU executes instr #3 which has no dependency to instr #1 or #2. Therefore an OOO CPU can execute two instructions in parallel in this case, whereas the in-order CPU may not be able to execute multiple instructions in some cases.