MIT Randomizes Tasks To Speed Massive Multicore Processors
itwbennett writes Researchers at the Massachusetts Institute of Technology have created a data structure that they claim can help large multicore processors churn through their workloads more effectively. Their trick? Do away with the traditional first-come, first-served work queue and assign tasks more randomly. The SprayList algorithm allows processors with many cores to spread out their work so they don't stumble over one another, creating bottlenecks that hamper performance.
Couldn't you just distribute the tasks ahead of time, giving every core a new task before its current task is finished?
Also, the article syas:
Random assignment has traditionally been frowned upon by those who think a lot about computer processors, the researchers noted in a paper explaining the work. A random scheduling algorithm takes longer to jump around the queue than a conventional one does. Caches can't be used to store upcoming work items. And if a set of tasks needed to perform one job are executed out of order, then the computer needs additional time to reassemble the final results.
I would think these problems are the same for the priority queue that they compare performance to. And I guess there are other ways which avoid these problems, which might produce faster results.
Dependency between instructions can temporarily delay execution of some instructions. This can be easily explained with an example:
1) A = A + 1
2) B = A + 2
3) C = D + 3
Above is pseudo code of 3 instructions being executed by a CPU. While instr #1 is being executed, the CPU will also try to execute instr #2, but since instr #2 needs the current value of register A, the CPU waits until instr #1 is complete and value of A can be used to calculate B. This is why in-order CPUs can be simpler to create, but be slightly slow.
Out-of-order CPUs work around this problem by deferring instr #2. Instead, the CPU executes instr #3 which has no dependency to instr #1 or #2. Therefore an OOO CPU can execute two instructions in parallel in this case, whereas the in-order CPU may not be able to execute multiple instructions in some cases.