Reverse Multithreading CPUs
microbee writes "The register is reporting that AMD is researching a new CPU technology called 'reverse multithreading', which essentially does the opposite of hyperthreading in that it presents multiple cores to the OS as a single-core processor." From the article: "The technology is aimed at the next architecture after K8, according to a purported company mole cited by French-language site x86 Secret. It's well known that two CPUs - whether two separate processors or two cores on the same die - don't generate, clock for clock, double the performance of a single CPU. However, by making the CPU once again appear as a single logical processor, AMD is claimed to believe it may be able to double the single-chip performance with a two-core chip or provide quadruple the performance with a quad-core processor."
Multiple cores presented as one sounds familiar. Last time I heard about that, it was just called "superscalar execution". As I understand it, multithreading and multicore were added because CPUs' instruction schedulers were having a hard time extracting parallelism from within a thread.
I believe that one and a half cores, sideways-threaded, is the way to go.
I'll form my OWN solar system! With blackjack! And hookers!
If the OS scheduler only know about one core, how in the world would it ever know to set two threads in the execute state simultaniously to take advantage of the extra horsepower. This article is lacking any substantial detail.
four cores presented as two?
What's the difference between 'reverse multithreading' (it sounds like having one execution pipeline on a chip with enough hardware for 2 cores) and just adding more Logic/Integer/FP units to a chip?
Didn't they do this on Star Trek once to get more power or something?
I'm thrilled to know that the threaded code I write isn't going to behave that way in hardware.
This would seem to be better for processes designed to only use one CPU, but it then prevents me from coding something in, say, OpenMP, in order to fine tune the parallelisation of my code (which would almost certainly work better than the generic optimizations that they would be putting in the CPU). Admittedly 95+% of programs aren't coded to be parallel, but this would still take away an option that would otherwise be there.
Perhaps there could be a documented way to access both CPUs directly? That may solve the problem.
It's better to vote for what you want and not get it than to vote for what you don't want and get it.
- E. Debs
Eh? What? The Register is British.
Part of the problem is that we're still writing software using techniques that were designed for single-processor systems. Languages like C and C++ just aren't suited for writing large distributed and/or concurrent programs. It's a shame to see that even languages like Java and C# only have rudimentary support for such programming.
The future lies not with languages such as Erlang, and Haskell, but likely with languages heavily influenced by them. Erlang is well known for its uses in massively concurrent telephony applications. Programs written in Haskell, and many other pure functional languages, can easily be executed in parallel, without the programmer even having to consider such a possibility.
What is needed is a language that will bring the concepts of Erlang and Haskell together, into a system that can compete head-on with existing technologies. But more importantly, a generation of programmers who came through the ranks without much exposure to the techniques of Haskell and Erlang will need to adapt, or ultimately be replaced. That is the only way that software and hardware will be able to work together to solve the computational problems of tomorrow.
Hah, yeah right, we started parallel programming just this semester and already I want to kill myself. "May not want to go back"? I'd go back in a heartbeat!
First, they get the software industry's licensing panties in a knot because users only want to pay a license fee for one physical chip instead of paying for each processor on the chip. Now, twisting the panties in other direction, they want to reverse all that by representing multiple processors as one virtual processor. Would that be covered by a multi or single processor license agreement? Do I still get free wedgie with that one?
The cores thread YOU!
It is pitch black. You are likely to be eaten by a grue.
What I want to know is which of the premises underlying Amdahl's Law they've managed to escape?
Lacking <sarcasm> tags,
In this case, AMD appears to be trying to decouple the states enough that the out-of-order resolution doesn't require micromanaging all of the processes from a single control point.
Lacking <sarcasm> tags,
Despite the lack of details, it sounds quite a bit like Intel's Mitosis research:/ speculative-threading-1205.htm
http://www.intel.com/technology/magazine/research
The article has simulated performance comparisons.
From the article:
"Today we rely on the software developer to express parallelism in the application, or we depend on automatic tools (compilers) to extract this parallelism. These methods are only partially successful. To run RMS workloads and make effective use of many cores, we need applications that are highly parallel almost everywhere. This requires a more radical approach."
.. in this post they reported on a project supposedly aiming at breaking down single threads into multiple threads so as to better utilize core utilization beyond the fourth core.
It supposedly involve Intel. I personally think both rumors are just that, but the timing is curious. Same source behind both? AMD PR people not wanting to lose out in imaginary rumored technology to Intel?
Michel
Fedora Project Contribut
Hyperthreading makes one core look like two. Reverse hyperthreading makes two cores look like one. So if we chain reverse hyperthreading with hyperthreading we can make one core look like one core but have twice as many features for the marketing department to brag about.
"The White House is not an intelligence-gathering agency," -- Scott McClellan, Whitehouse spokesman.
As a systems admin in a large datacenter with many AIX, Solaris, HPUX, Redhat, and Suse boxes, I'm glad to see a vendor who wants to simplify management of systems (one processor is easier to manage than two). This is to say nothing about all the developer effort that would be saved from not needing to make making SMP-safe code. I want large, enterprise level boxes to be just as easy to administer/use as the cheapest desktop in their line. The OS should see as-simple-as-possible hardware. You wouldn't believe all the different kinds of "system managent consoles" I have to log into, which are always vendor specific and annoying.
... these are the kinds of stories we get when the digg kiddies submit articles to /.
Rumor, wisps of hot air, and nothing definitive.
About the best language I've ever seen for multi-threading is occam, the language used with Transputers. occam allows threading to be done as a language primitive. http://en.wikipedia.org/wiki/Occam_programming_lan guage
Engineering is the art of compromise.
"AMD is claimed to believe it may be able to double the single-chip performance with a two-core chip or provide quadruple the performance with a quad-core processor."
:)?
Even the article writers aren't pretty sure that's possible to do, apparently it's possible to "claim" it though, what isn't
Modern processors, including the Core Duo rely on a complex "infrastructure" that allowed them to execute instructions out of order, if certain requirements are met, or execute several "simple" instructions at once. This is completely transparent to the code that is being executed.
Apparently for this to be possible the commands should not produce results co-dependent of each other, meaning you can't execute out-of-order or at-once instruction that modify the same register for ex.
This is an area where multiple cores could join forces and compute results for one single programming thread as the article suggests.
But you can hardly get twice the performance from two cores out of that.
I just wonder what Oracle's CPU price index will be for this thing if it makes it out? Let me see, you have an AMD super single CPU core, not a hyper-threaded or standard one. You need to multiply (PI*GHZ/Watts*Sockets*200000)+(5000/days_until_qua rter_closeout) to get the correct license fee. :-)
After all, there isn't a problem in time or space that can't be solved by simply reversing the polarity of the neutron flow.
Considering how awesomely powerful many CPU's are, I would think that they would continue moving towards more multi-core cpus instead. After a while, lots of cpus will out flank a fast one.
There are several techniques for increased performance or throughput that the designers of next gen microarchitectures are likely looking at.
- 05-DCP.pdf)
There are extensions to known techniques;
A: more execution units, deeper reorder buffers, etc trying to extract more Instruction Level Paralelism (ILP).
B: More cores = more threads
C: hyper threading -- fill in pipeline bubbles in an OOO superscaler architetcure; also = more threads
I personally don't think any of these carry you very far...
Then there are some new ideas:
a: run-ahead threads -- use another core/hyperthread to perform only the work needed to discover what memory accesses are going to be performed and preload them into the cache - mainly a memory latency hiding technique, but that's not a bad thing as there are many codes that are dominated by memory latency
a': More aggressive OoO run-ahead where other latencies are hidden
Intel has published some good papers on these techniques, but according to those papers these techniques help in-order (read Itanic) cores much more than OoO.
b: aggressive peephole optimization (possibly other simple optimizations usually performed by compilers) done on a large trace cache. Macro/micro-op fusion is a very simple and limited start at this sort of thing. (Don't know if this is a good idea or not, or whether anyone is doing it)
But it's far from clear what AMD is doing. Whatever it is, anything that improves single threaded performance will be very welcome. Threading is hard (hard to design, implement, debug, maintain, and hard to QA). And not all code bases or algorithms are amenable to it.
Intels next gen (nahalem) is likely going to do some OoO look-ahead, as they have Andy Glew working on it, and that's been an area of interest to him...
A very interesting new concept is that of "strands" (AKA: dependency chains, traces, or sub-threads). (The idea is instead of scheduling independent instructions, schedule independent dependency chains. - For more info, see http://www.cse.ucsd.edu/users/calder/papers/IPDPS
But it's not clear how well it would apply to OoO architectures, but I would expect that likely approaches would also need large trace caches.
Applying this to an OoO x86 architecture, and detecting the critical strand dynamically in that processor could be very cool, and potentially revolutionary.
It will be very interesting to see what Intel and AMD are up to -- it would be even cooler of they both find different ways to make things go faster...
Ian Ameline
This would be, perhaps, more useful with the quad-core version - appearing as two processors to still allow the OS to allocate multiple threads?
Is Microsoft going to recognise this contraption as a single, or multi-liscense-able processor ?
And
Will AMD only hide the fact there's multi-cores from Operating systems other than Microsoft ?
Wanna fight ? Bend over, stick your head up your ass, and fight for air.
I'm starting a big rendering job and this type of chip would be perfect. I've been using dual chip motherboards for years and started recently using multicore chips. The best I've ever gotten is 1.5X for any dial chip configuration and generally it's much less. On rare occasions I've had dual systems run slower but that hasn't happened too often. The only reason they are practicle is the software is so bloody expensive it makes since. Most people get no benefit off dual chip configurations since very few softwares can use more than one processor. Renderers are the exception. Even things like Maya can only use multicore or chips in rendering. All other functions use one processor. This type of intergrated chip should solve the problem. Now how many CPUs can you cram on a chip? Personally I'd like to see one the size of a dinner plate. Alright it might have to have it's own 220 line but I can work with that. Neighbors might get annoyed when I start a render and their lights dim though.
The bus between the two cores is FAR TOO SLOW for this sort of operation. Moving [say] EAX from core 0 to core 1 would take hundreds of cycles.
So if the theory is to take the three ALU pipes from core 1 and pretend they're part of core 0... it wouldn't work efficiently. Also what instruction set would this run? I mean how do we address registers on the second core?
AMD would get more bang for buck by doing other improvements such as adding more FPU pipes, adding a 2nd multiplier to the integer side, increasing L1 bandwidth, etc.
This story is pure and utter bullshit.
Tom
Someday, I'll have a real sig.
The idea is basically a way to continue to extend Moore's Law with current Comp Sci paradigms. Multi-core (and multi-CPUs generally) is the same idea, but requires software re-thinking to really be advantaged.
More units don't help things go much faster unless you can figure out how to feed them.
Like multi-CPU tech, there's probably a big diminishing return, so this seems like a 2 to 4-ish X multiplier - or about 18 to 36 months more of Moore.
--
graphicallyspeaking
graphically speaking
There are various projects that take differing views about how to do this. One class of such processors are "run-ahead" microprocessors. The idea here is to allow invalid results to be executed but not retired by a second processor running up to a few thousand instructions "ahead" of the processor executing real code to be retired.
There are several variations of this. One is to use the second core to run in advance of the 1st thread, the first thread effectively acting as a dynamic and instruction-driven prefetcher. One such effort includes "slipstreaming" processors, which works by using the advanced stream to "warm up" caches, while the rear stream makes sure the results are accurate, and to dynamically remove unecessary instructions in the advanced stream. Prior, similar research has been done to perform the same work using various forms of multithreading (like HT/SMT, and even coarse-grained multithreading). See the www.cs.ucf.edu/~zhou/dce_pact05.pdf for more details.
Others, such as Dynamic Multithreading techniques take single-threaded code and use hardware to generate other threads from from a single instruction stream. Akkaray (at Intel) and Andy Glew (previously intel, then amd, then...?) have proposed these ideas, as have others. Some call it "Implicit Multithreading".
Now, the register article is so wimpy (as usual) that there's no actual information about what technologies are used, but maybe it's a variation on one of the above.
From here:
Researchers in the parallel processing community have been using Amdahl's Law and Gustafson's Law to obtain estimated speedups as measures of parallel program potential. In 1967, Amdahl's Law was used as an argument against massively parallel processing. Since 1988 Gustafson's Law has been used to justify massively parallel processing (MPP). Interestingly, a careful analysis reveals that these two laws are in fact identical. The well publicized arguments were resulted from misunderstandings of the nature of both laws.
This paper establishes the mathematical equivalence between Amdahl's Law and Gustafson's Law. We also focus on an often neglected prerequisite to applying the Amdahl's Law: the serial and parallel programs must compute the same total number of steps for the same input. There is a class of commonly used algorithms for which this prerequisite is hard to satisfy. For these algorithms, the law can be abused. A simple rule is provided to identify these algorithms.
We conclude that the use of the "serial percentage" concept in parallel performance evaluation is misleading. It has caused nearly three decades of confusion in the parallel processing community. This confusion disappears when processing times are used in the formulations. Therefore, we suggest that time-based formulations would be the most appropriate for parallel performance evaluation.
that makes no sense at all. So you want all boxes to act as uniprocessor... and then what happens when you want to run multiple tasks at once? You do realize sometimes you just want things to run parallel don't you?
I guess by your response I'm highly doubting you admin systems in a large datacenter because it makes absolutely no sense. I don't know any admin that would only want to have one processor, logical or not, in a large server. There's WAYYYY too many things that need to go on at the same time. There's a reason why Sun sells 128-way systems, and it's not because they can get the job done with one really fast cpu.
We have always been at war with hyperthreading!
Imagine if you had TWO C&C chips to split the workload.. What about 4? What about an array of C&C chips to send jobs to a farm of workhorse processors?
Where are the monkeys?
Still sounds like distributed.net.
I write a fair shitload of multithreaded and single threaded code. Most code cannot be magically parallelized. Parallel execution of code that has not been made thread-safe would cause teaming masses of race conditions. Null pointers everywhere. Division by zero would be the norm, not an exception.
Now, if they're talking about allowing separate processes to run separately without specific SMP code in the kernel, fine. But that's not 2x performance.
Stop-Prism.org: Opt Out of Surveillance
Sorry, I don't have mod points. Thats pretty darn informative right there.
I think thats a great example of the problems facing researchers in matehmatics (and sciences) today. Its really hard to make connections between all of the disperate facts, theories, and expiramental data to draw conclusions and lead to productive research and development. In short, we often experience mental stack overflow errors.
Well.. maybe. Or Maybe not. But Definitely not sort of.
Unless their compiler can predict the future, multiple cores will always have synchronization issues that keep them from approaching twice the performance of one core.
This was proposed in acadamia over 10 years ago. Its called speculative multithreading, or "multiscalar" as coined by one of the primary inventors at the University of Wisconsin (Guri Sohi).
Basically the processor will try to split a program into multiple threads of execution, but make it appear as a single thread. For example, when calling a function, execute that function on a different thread and automatically shuttle dependent data back/forth between the callee and the caller.
"...two-core chip or provide quadruple the performance with a quad-core processor." unify, unite, and unihilate....beware the QUAD LAY-ZAH!
my site of misleading and incorrect information!
It might be interesting if they took this idea in a slightly different direction. Set it up so the OS detects two CPUs. But, when the OS fails to utilize both CPUs effectively, allow the idle CPU to take some of the active CPU's load. I'm taking this idea from nVidia working on load balancing between graphics and physics in a SLI setup. So in this case the OS gets the best of both worlds, the ability to break tasks off to each CPU and a free boost when it's stuck with a single cpu-limited thread.
Striping: What is that? Raid 1? Raid 0? You take multiple disks, present them as one, and let the controller make the most effecient use of them while the OS and all the programs just have to deal with one big disk.
Looks like the same thing. You take multiple CPU's present them as one, and let the controller figure out how to best use them.
This could make for hot-swappable CPUs (heh) and the ability to have a CPU die without taking out your system. The redundacy nature of the other RAID configurations don't seem to translate very easily, but the 'encapsilation' concept seems to fit nicely.
--Welcome to the Realm of the Hawke--
How is one processor easier to manage than two? The OS takes care of it for you. All you have to do is make sure the load is appropriate and balanced. But you have to do that anyway... The problem with the OS seing "as-simple-as-possible" hardware is that it can't take advantage of any of the features that you get with high end hardware. You can't get good diagnostics. And it is difficult to tune for a particular task. What if the algorithm that AMD uses to parallelize single threads isn't very good for your particular application? How do you tune the OS for a particular task if everything is done by the hardware?
If you don't care about these things, perhaps you should consider a new line of work.
-matthew
"THERE IS NO JUSTICE, THERE IS ONLY ME." -Death
To heck with powering millions of individual computers. Just make a million-core chip and wire them all up like this and you'll have one computer with the power of millions less all that nasty overhead of having to run an OS on all of the other computers. Power to the people!
Could be that what is envisioned is to have one and the same cache shared between two processor cores, no need to separate it or arbitrate most of the time, so the coupling is looser than multiple execution units on a machine. What could wind up happening is that programs would be sequential only when forced to be. If it were a whole new ISA this might be interesting to program anew. As it is there'll need to be a bunch of these implied synch points to emulate 80x86 well enough. Lord knows if enough parallelism can be found to make it worth while.
imagine what you could do with it!
FTA: It's the very antithesis of the push for greater levels of parallelism
There is only one way to achieve optimum performance using multiple cores (or multiple processors) and that is to adopt a non-algorithmic, signal-based, synchronous software model. In this reactive model , there are no threads at all, or rather, every instruction is its own thread or processor object. It waits for a signal to do something, performs its operation and then sends a signal to one or more objects. There can be as many operations executing in parallel. At every tick of a virtual clock, there is a list of operations to be executed. These can be chanelled to the available cores for processing, assuring a full load for the cores at all times.
The only caveat is the von Neuman memory access bottleneck which gets you every time. In the end, I suspect that only optical computing or something based on quantum tunelling will get around this very serious problem.
You have a point about 1 processor being basically equally as easy as two to manage. It is the usefulness of this example as perhaps the start of a larger trend of simplifying system management that you fail to give value to. It is this potential trend that got me happy.
Think what you will, I know damn well where I work, and how much of my time is wasted learning yet another proprietary management tool that only brings very marginal benefits at the end of the day. That's why all of Google's datacenters run ONLY 1U cheapo rackmount servers with virtually no hardware redundancy. They agree that big iron just isn't worth the huge increase in cost. It is me who doubts you understand how marginal the gains from "big iron" usually are. They do failover in software (designed in house, unfortunately for the rest of us). I agree that many processors are good, but it is even better when they appear as one.
The easiest boxes to admin have one processor, one power supply, one nic, etc. When there are problems, there are less places to check. They do not need exotic drivers, or exotic options to turn on "the full powers". Think "make -j 2" when you compile a kernel, when all I wish I **needed** to know was "make". The more I have to learn all the little gotchas and caveats to using all this fancy stuff, the less productive I am, generally speaking.
Isn't this exactly what intel did with the IA64 instruction set, i.e., the Itanium family. Added explicit support for simultaneous instruction execution?
Personally I'm still a big fan of this instruction set/system and feel it's a real shame that backward's compatibility/resistance to change has kept it out of the mainstream. I would dearly love the irony if AMD tried to introduce an Itanium like processor now.
If you liked this thought maybe you would find my blog nice too:
Armchair engineer here: Why couldn't they just virutalize the registers?
The case you stated is balancing 2 tasks (1. physics, 1. graphics) over 2 processors. - fine
Then you say 'well, they're doing that easily enough, lets divide one task over two processors', without noticing it's a totally different problem.
I'm sorry, but I think the only thing 'insightful' about this comment is that you don't really understand what you're talking about.
Is this more or less like a beowulf cluster on a chip?
No, seriously, I'm having trouble envisioning it.
You see? You see? Your stupid minds! Stupid! Stupid!
I don't have a lot of background in CPU architechture, but what if there was a parallel processing unit designed specifically to allocate threads to the cpus? This way, the cores can all function as one at the hardware level, rather than the software level (thus making it easier on developers and potentially increasing performance). Would it be better to have a dedicated unit/sector to process this information and divy it up to the separate cores, or no?
I might be talking out my ass here, but it seems to me that the best way to run a multi-core processor as a single would be to have the processor being used change when there is a context switch. Seems simple enough doesn't it?
The purpose of "good dispatching" (i.e. out-of-order execution) is to hide the latencies of misses to main memory (it takes between 200 and 400 cycles these days to get something from memory, assuming that the memory bus isn't saturated), by executing instructions following the miss but not dependent on it. Out-of-order execution has been around Pentium Pro, btw.
The Raven
I've always heard that windows support for multiple processors was pretty limited. Is this still the case? Is limited multi proc support for windows encouraging the development of this technology? Does load balancing of processes or threads across processors happen automatically?
I keep hearing that people get dual proc or dual core machines like the apple core duos, but that one proc or core lies dead under windows. Is this actually the case? Is this just a driver problem specific to a few machines? Do you need windows server 2003 or something?
Rather than trying to make an end run around Amdahl's law, why not duplicate the processor paths?
Say you have a single threaded application with lots of branches and little instruction level parallelism (ILP). Rather than trying to predict the branches or worry about read-before write errors, just clone the processor state and run BOTH branches simultaneously! If you have a core (or three) just lying around while you run a single threaded app, use it. No need for prediction at all, and no penalty for mispredicting a branch. Just dump the state of the cores that missed, clone the "correct" path to all available cores, and keep going. Assuming core-to-core cloning is fast and there is no ILP that could be taken advantage of by the other cores, why waste them?
Tell me I'm not the first person to think of this, 'cause it's too obvious.
Think of it - Multi-core CPU's bound to appear as a single CPU, and then Hyperthreading on top? =)
- It's not the Macs I hate. It's Digg users. -
None of this will mean anything to desktop PCs, this is something that only the HPC crowd would need. First you have to start from the basic assumption that you have an algorithm that you only know how to code in serial fashion. And, you need it to run as fast as possible. You would like to run it on a single 6 GHz CPU but all you have on hand is a pair of 3 GHZ CPUs, and this algorithm of yours can't be partitioned for parallel execution.
Intuition tells me that if a competent programmer with complete access to information about and understanding of a particular algorithm can't figure out how to effectively parallelize it, there's no way a hardware state machine will do a better job of it.
Again, only the HPC crowd would ever run this way. E.g., you could boot DOS on your 3 GHz Opteron and it would be several orders of magnitude faster than Windows, but it would only run one thing at a time. Very few desktop PC users can live with single-tasking today, otherwise more people would still be using DOS. And multiple cores are better for running multiple independent tasks than a single core, so this whole pursuit is only useful if you want to use a a monster machine to solve a single problem at a time (e.g. Deep Thought).
-- *My* journal is more interesting than *yours*...
Excellent and informative post.
Ian Ameline
When you have two cores, it would be a real waste to not let an SMP-optimized program see both of them. That's why I doubt that if this ever becomes a product, it will look to the OS like a single core. But if it really is possible to let two cores cooperate on running a single thread, it would be nice for them to do so when an application is only willing to run as a single thread.
Let's try an analogy: Assume two heads are better than one. But some tasks are explicitly not meant for two heads, say, taking a math test. So say I go in for the test, appearing only as a single "head" to the test-giver (the "interface") but I covertly ask my friend to help me on the side. This makes my result better. Of course if the "interface" explicitly allowed for team work on the test, there would be no reason for the covert (probably inefficient) communication, so we could drop the pretense of being one person. So the analogy is, when a single core is asked to "work alone" on a problem and it can figure out how to get useful help from its friend so the work goes faster, AMD wants to make sure that it really gets the help.
AMD is planning to increase the number of cores on a single chip. I would expect to see hundreds of cores on a single chip a decade from now. The question becomes, how best to allocate cores to different processes. Should each thread just one core? Should the cores run at different speeds and some threads get faster cores than other cores?
From a design standpoint, it is better to run every core at the same speed. It makes design much simpler because every core is the same. The problem then becomes what to do with a thread that needs a faster core. It would be very nice to be able to combine multiple slower running cores to form a virtually fast running core.
AMD's strategy is obvious: when Intel steps in a direction that breaks backwards compatibility or requires industry wide changes, they step in and provide the lazy route for the industry. If they hadn't adopted this strategy for x86_64, x86 would be on its dying breath right now. It's annoying really... I used to hate Intel's backwards compatibility dogma, but now AMD has also tried to take advantage of it. Here they are now, with a likely half-baked solution that promotes status quo. This will likely persuade the business types weary of risk. In my view, Intel,AMD, and Microsoft are all guilty of holding the industry back. We are doomed to sit at the local maximums because of these policies.
But, with two cores, you could have a way to predict "branch" and "not branch" at every prediction spot. The core that gets it right sends the registers to the other core so they can continue as if every branch were predicted correctly...
That would only work if you had a nice fast way to copy registers accross in a very small number of clock cycles... so again, just a bunch of speculation. But it was a neat enough idea I had to say it.
Mark of the Coder fades from you. You perform Opening on World of Warcraft. Warcraft crits GPA for 4. GPA dies.
"This is to say nothing about all the developer effort that would be saved from not needing to make making SMP-safe code".
That's silly. Come on, you still have to write "smp-safe" code if you want to run more than one thread, or even more than one process if they access shared resources. And it haapens everday...
Most of the time the CPU would be running its special "appears to the OS to be single core" thing, but if a process really explicitly wanted to see all of the available cores, then it could call on some SMP aware OS to run on the CPUs hardware virtualisation. Might not work, im not sure.
Eh? What? The Register is British.
Yes, embarassing isn't it old chap?
Willy
Multithreading is more then just speeding things up. Quite importantly it is also allowing for more responsive operation. Ex: having gui and logic in separated processes, allowing the gui to respond when logic is bussy. This does'nt make annything "faster", it just makes things more responsive.
whats the deal with putting all these things in parallel?
everybody knows they would work faster in SERIES!
It seems that the thing is that there will be a two core AMD processor working as hard as it can and displaying to the system as one single processor. I don't know if that will be able to work harder over time or not. And about a cooling method, the way it sounds, it shouldn't be overheating that much. So in reality, I don't really know what AMD is aiming for other than putting Intel's HyperThreading technology down on the market.
"Instant gratification takes too long." - Carrie Fisher
One good example: The ability to dynamically switch between SLI modes is what is currently making me look into it for my new build, since i can use multi monitors during the day for work and one for gaming at night. Using this concept for CPUs, I think you could build chips that can be both powerful and flexible, instead of either or as is now the case.
If a nation expects to be ignorant and free, in a state of civilization, it expects what never was and never will be.-TJ
What part of that article is relevant to the topic of the new AMD processors?
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
On the surface, this sounds a bit like transforming single threaded into multithreaded code, which as I gather, is pretty much impossible to do in a generic and widely useful fashion.
What it may be instead, is the ability to dynamically reallocate execution logic into different configurations of 'cores' at will; so the CPU could appear either as a single core with 6 execution units, or as a dual core with 3 units in each, to take advantage of either instruction- or thread-level parallelism, whichever is in greater abundance at the moment. This way it's not creating extra paralellism out of thin air, which wasn't there before (which, I believe, would either be ridiculously hard or impossible to do), but rather makes maximum use of whatever paralellism is there to be found*.
I'm not sure how theoretically possible this is, but it does seem more likely than the other proposition.
.
Work is punishment for failing to procrastinate effectively.
I'm suprised you are willing to sacrifice performace for simplicity. You realise a single core can only run a single process at once, correct?
This is to say nothing about all the developer effort that would be saved from not needing to make making SMP-safe code.
lol. It really isn't that hard to make code thread and SMP safe. Check here for a good starting point. I learned how to write thread-safe code as a hobby over the course of about two weeks, I converted my thesis work and a simulation toolkit to be thread-safe in about a week. It really isn't hard. I'm not a CS either, I'm an aerospace engineer. Now I can do a sh*tload of monte carlo runs on my dual core box at home at double the rate. (By the way: Qt's thread library is great but if you can't live with the license then check out OpenSceneGraph's OpenThreads library)
I'd like to believe you have found the Silver Bullet. Any examples of real complex systems you have developed with it?
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
Forget any talk about CPU efficiency. This is a software licensing play first and foremost. Much of the most expensive software (*cough*Oracle*cough*) is licensed per CPU. There has been some browbeating by AMD and Intel to get ISVs to license per socket, and some have gone this way, but there has been a lot of acrimony. If you license per CPU, the software company makes out like a bandit as all the machines have twice as many CPUs in them. The result is some customers defer upgrading CPUs, and AMD/Intel lose out. If the ISVs charge per socket, AMD/Intel is very happy since it's a no-brainer for their customers to upgrade, but the ISV perceives a potential loss of revenue.
In the "old days" where CPUs just got faster and faster, the ISVs didn't complain about this. In fact, they benefitted since they could cram more bugs^H^H^H^Hfeatures into each sequential release and the customer didn't complain. Now the prevalence of dual core CPUs makes them feel like they're potentially leaving money on the table.
Enter "reverse hyperthreading." Make a multicore CPU look like a single CPU, and those software licensing issues go away and we're back to the good old days.
Do I still get free wedgie with that one?
OKLAHOMA !!! OOOOOOOKLAHOMA !!!
You have a point about 1 processor being basically equally as easy as two to manage. It is the usefulness of this example as perhaps the start of a larger trend of simplifying system management that you fail to give value to. It is this potential trend that got me happy.
I don't think that's a trend at all. I think this is AMD trying to find a gimmick to oust intel, assuming it's true at all. If you want easy to manage, buy M$. But realize with that ease of management you loose the flexibility of *nix
Think what you will, I know damn well where I work, and how much of my time is wasted learning yet another proprietary management tool that only brings very marginal benefits at the end of the day. That's why all of Google's datacenters run ONLY 1U cheapo rackmount servers with virtually no hardware redundancy. They agree that big iron just isn't worth the huge increase in cost. It is me who doubts you understand how marginal the gains from "big iron" usually are. They do failover in software (designed in house, unfortunately for the rest of us). I agree that many processors are good, but it is even better when they appear as one.
I hate to be the one to break it to you, but google is moving away from this. We have several accounts with them and I can tell you they're doing just the opposite. They've found all the 1U's to be a waste of energy and resources and have been looking to purchase "big iron" to replace it.
The easiest boxes to admin have one processor, one power supply, one nic, etc. When there are problems, there are less places to check. They do not need exotic drivers, or exotic options to turn on "the full powers". Think "make -j 2" when you compile a kernel, when all I wish I **needed** to know was "make". The more I have to learn all the little gotchas and caveats to using all this fancy stuff, the less productive I am, generally speaking.
No, the easiest boxes to administer are the ones that were engineered with administration in mind. I'd MUCH rather have a "big iron" sun box that says "HEY FUCKER, CPU6 IS FUBAR", than a 1u/1cpu whitebox that just starts having compile issues, and random reboots with no apparent cause. What you want is good engineering, you're just confused. I can tell you from personal experience, it can be far easier to track down a problem on a 28way Sun box than a no-name built-from-newegg whitebox.
John Carmack was recently interviewed regarding the new multicore game consoles. One of the more memorable quotes:
"...Anything that makes the game development process more difficult is not a terribly good thing."
This might help games, but by the time it comes out it could be too late. At IDF, Intel demonstrated some very powerful programmer aids for multi-threaded programming. By the time AMD's technology is on the market, will it offer an improvement to games that take advantage of new multi-threading techniques?
Given that the current "make your program multithreaded" techniques are primarily making "for" loops multi-threaded, is AMD's technique going to "automagically" distribute loops among all of the cores?
No, I will not work for your startup