Reverse Multithreading CPUs
microbee writes "The register is reporting that AMD is researching a new CPU technology called 'reverse multithreading', which essentially does the opposite of hyperthreading in that it presents multiple cores to the OS as a single-core processor." From the article: "The technology is aimed at the next architecture after K8, according to a purported company mole cited by French-language site x86 Secret. It's well known that two CPUs - whether two separate processors or two cores on the same die - don't generate, clock for clock, double the performance of a single CPU. However, by making the CPU once again appear as a single logical processor, AMD is claimed to believe it may be able to double the single-chip performance with a two-core chip or provide quadruple the performance with a quad-core processor."
If the OS scheduler only know about one core, how in the world would it ever know to set two threads in the execute state simultaniously to take advantage of the extra horsepower. This article is lacking any substantial detail.
.. in this post they reported on a project supposedly aiming at breaking down single threads into multiple threads so as to better utilize core utilization beyond the fourth core.
It supposedly involve Intel. I personally think both rumors are just that, but the timing is curious. Same source behind both? AMD PR people not wanting to lose out in imaginary rumored technology to Intel?
Michel
Fedora Project Contribut
Boy are you screwed.
Even though the trade rags haven't realized it, real life software engineers have been using parallel programming techniques for decades. Sure, apps are optimized for what they run on, so most shrinkwrap software at your local CompUSA probably doesn't have much of that in there, but the author missed the boat already when it comes to "had several years focusing on...".
Better learn to like that parallel programming stuff. It's the way things work.
What AMD appears to be trying isn't the same as superscalar processing, but it might run into a similar problem.
b leCheckedLocking.html shows the problem, which already is an issue on 64-bit hardware).
Where superscalar requires a good dispatcher to minimize branch prediction misses, AMD appears to be making decisions, not about dispatch, but about how to do locking of shared memory (think critical sections).
Critical section prediction might prove less expensive than branch prediction in practice even if they are similar in theory (http://www.cs.umd.edu/~pugh/java/memoryModel/Dou
"AMD is claimed to believe it may be able to double the single-chip performance with a two-core chip or provide quadruple the performance with a quad-core processor."
:)?
Even the article writers aren't pretty sure that's possible to do, apparently it's possible to "claim" it though, what isn't
Modern processors, including the Core Duo rely on a complex "infrastructure" that allowed them to execute instructions out of order, if certain requirements are met, or execute several "simple" instructions at once. This is completely transparent to the code that is being executed.
Apparently for this to be possible the commands should not produce results co-dependent of each other, meaning you can't execute out-of-order or at-once instruction that modify the same register for ex.
This is an area where multiple cores could join forces and compute results for one single programming thread as the article suggests.
But you can hardly get twice the performance from two cores out of that.
I'm guessing economic reasons push harder than technical ones.
Sony already assumes that their PS3 chips will have a fault in one of the cores, and simply lock off that section when one is found. One fault no longer kills a chip, though two can render the power unacceptably low.
The cool thing is this scales. If you have a 10cm^2 chip, traditionally your chance of perfection is 1/4th that of a 5cm chip, cutting your yield drastically. But if you have 6 cores on a chip with one dead one, and you want to go to 12, you should get a similar yield for a proportionally similar amount of dead cores.
Cores let you limit damage from manufacturing errors, letting you build bigger chips more cheaply. At least, that's my layman's understanding.
The ______ Agenda
It might be interesting if they took this idea in a slightly different direction. Set it up so the OS detects two CPUs. But, when the OS fails to utilize both CPUs effectively, allow the idle CPU to take some of the active CPU's load. I'm taking this idea from nVidia working on load balancing between graphics and physics in a SLI setup. So in this case the OS gets the best of both worlds, the ability to break tasks off to each CPU and a free boost when it's stuck with a single cpu-limited thread.
I'd suggest x86-secret & the Reg have got the wrong end of the stick here. SMT is running two threads on one core - try taking "reverse hyperthreading" literally. I'd suggest that AMD are looking at running the one same thread in lock-step on two cores simultaneously. This is not about performance, it is about reliability - AMD looking at the market for big iron (running execution cores in lock-step is the kind of hardware reliability you are looking at on mainframe systems).
The behaviour of a CPU core should be completely deterministic. If the two cores are booted up on the same cycle they should make the same set of I/O requests at the same point, and so long as the system interface satisfies these requests identically an on the same cycle, then the cores should have no reason not to remain in sync with each other until the next point that they both should put out the next, identical pair of I/O requests. If the cores every get out of sync with each other, this indicates an error.
Just speculation of course, but I seem to recall AMD looking into this having been rumoured previously.
G.
There are other academic projects that are attempting to do TLS dynamically, in hardware. PolyFlow at Illinois is one, Dynamic Multithreading (mentioned elsewhere in this story) is another, I'm sure there are others.