MIT Randomizes Tasks To Speed Massive Multicore Processors
itwbennett writes Researchers at the Massachusetts Institute of Technology have created a data structure that they claim can help large multicore processors churn through their workloads more effectively. Their trick? Do away with the traditional first-come, first-served work queue and assign tasks more randomly. The SprayList algorithm allows processors with many cores to spread out their work so they don't stumble over one another, creating bottlenecks that hamper performance.
It always seemed wrong that out of order execution was more efficient than in order. I'll have to read up on it more. You'd think that the results coming in out of order would mess things up.
"Freedom in the USA is not the ability to do what you want. It is the ability to stop others from doing what THEY want"
3D print it too, that always helps!
For tasks that need low level code optimization I can't see how this wouldn't hurt processing speed. Maybe it's time to add a tiny brain to each CPU capable of learning, in a primitive way, which execute orders work best?
If video games influenced behavior the Pac Man generation would be eating pills and running away from their problems.
...many computing algorithms rely upon randomization for efficiency.
Couldn't you just distribute the tasks ahead of time, giving every core a new task before its current task is finished?
Also, the article syas:
Random assignment has traditionally been frowned upon by those who think a lot about computer processors, the researchers noted in a paper explaining the work. A random scheduling algorithm takes longer to jump around the queue than a conventional one does. Caches can't be used to store upcoming work items. And if a set of tasks needed to perform one job are executed out of order, then the computer needs additional time to reassemble the final results.
I would think these problems are the same for the priority queue that they compare performance to. And I guess there are other ways which avoid these problems, which might produce faster results.
RTFA, it was just an elaborate troll to get this surrealist theme song parody stuck in your head:
https://www.youtube.com/watch?...
Good for CPUs numbering up into the 80's. Sheesh.
SprayList explained here.
The 68000 had one approach to memory management that was to randomly delete a cell, and repeat. It was more efficient than all the "top-down" solutions of the day. Statisticians compare against a "random" approach when trying to determine if an effect is statistically meaningful. They ask "does the data support that approach x yields more than would be done by chance". It would be nice if computer science of this century could catch up to or surpass the basic statistical ideas developed in the late 1800's.
Like a hash table?
Then why did it still beat SprayList at 32+ cores? The only time SprayList beats the competition is at 16 cores.
if you bump into another packet you back off a random wait time. The other sender will back off, also a random wait time. Less likely to bump again.
If you want to read up on this stuff, google on "ibm patent multiple run queues" . . . they have done a lot of work in this area.
Schroedinger's Brexit: The UK is both in and out of the EU at the same time!
This is old hat in the CS world that gets re-discovered fairly often: you can increase throughput at the cost of ravaging your latency. For some tasks, this is an acceptable tradeoff -- for others (especially anything interactive) it's completely unacceptable. Moreover, any synchronization point in the program converts the worst-case latency of the previous tasks into limits on throughput, e.g. the time it takes to join a set of N threads is equal to the maximum latency of any single thread in the list.
The best analogy is an elevator (sorry car folks): you can optimize your elevator for throughput by having it always select the closest floor with an active call. The cost, obviously, is that if people are shuffling between floors 1 & 5 a lot, then the poor guy up on 30 might wait a really long time. The throughput is still maximized though, since the elevator can do more operations per unit time by avoiding the long trip to and from floor 30.
In some cases this is fine, in the vast majority of cases you want to ensure that all tasks complete in a more bounded amount of time, even if that reduces the total number of tasks completed per unit time.
I guess the headlines of AMD strategic meeting is gonna me something in the like of "MOAR CORESSS!!!!!"
Elok
If random works better than their best 'smart' allocation algorithm,
then getting a bunch of processors to work nicely together
is going to take a while to sort out.
I have never really 100% understood why it is better for sorting, than say bubble, is there any relation to this approach re core usage?
I have been workong directly and indirectly with massive scale computing for decades:
1) break single threaded tasks into smaller jobs that can be distributed.
2) test that they still work
3) establish a chain of dependencies so that tasks which must be performed in order will.
4) establish pipeline cost and cache coherency cost for each job.
5) distribute tasks who share common data sets to cores which have the lowest latency between them
6) take jobs which depend on output of other jobs and offset their start time so the input data from the other job is available befor it is need in the current job decreasing the cost or synchronization.
8) optimize job distribution so that tasks running in parallel with dependencies will be distributed to cores nearest to one another in the coherency ring.
This is old school thinking. Randomizing should not offer better performance than properly scheduled tasks.
When I eat my dinner I leave the good stuff until last; that way I end the meal with a pleasant taste in my mouth.
Efficiency, certainly. But for every randomized algorithm, there is an unrandomized one which would perform better. Randomization can sometimes lower the upper bound on error in machine learning algorithms, for instance, but this is only because adding noise to the algorithm helps in the case where the environment is smarter than you (and therefore can present to you exactly that data which makes you draw an incorrect conclusion), which is after all the literal worst case and therefore what the 'error upper bound' is about. So, theoretically, if you are developing a machine learning algorithm to play Rock, Paper, Scissors against God, then you may want a source of true quantum random noise, not because 'randomness' or 'chaos' can be used to improve upon some algorithm but because it can act as a poison to a competing smarter algorithm.
There are also cases where adding a source of noise to an algorithm improves the algorithm because some piece of the algorithm is performing worse than random. For instance, a combination lock opening algorithm which repeatedly tries the combination "0-0-0" is doing worse than random; it would be better if it tried a random combination each time. But the best algorithm would be one which learned from past experience, and didn't try combinations already shown not to work; the randomized algorithm is an improvement on the stupid algorithm, but that doesn't mean you're getting useful information out of it.
In fact, there are general proofs that one cannot get work out of randomness, for the same reason you cannot turn ice cubes into water and electricity. The laws of thermodynamics are just as sound when we're talking about information-theoretical entropy. Random noise is maximum entropy, the most expensive information to transmit, because truly random noise is necessarily incompressible. This lack of compressibility has consequences. What it comes down to is, if you're trying to write an algorithm to find the shortest path through a maze, then if you think the sum of all of the digits in your phone number, modulo 3, is too disconnected a factoid to have any useful correlation with the maze you're working on, then why would you expect random noise, which is literally maximum uncorrelation to do any better?