Intel - Market Doesn't Need Eight Cores
PeterK writes "TG Daily has posted an interesting interview with Intel's top mobility executive David Perlmutter. While he sideswipes AMD very carefully ('I am not underestimating the competition, but..'), he shares some details about the successor of Core, which goes by the name 'Nehalem.' Especially interesting are his remarks about power consumption, which he believes will 'dramatically' decrease in the next years as well as the number of cores in processors: Two are enough for now, four will be mainstream in three years and eight is something the desktop market does not need." From the article: "Core scales and it will be scaling to the level we expect it to. That also applies to the upcoming generations - they all will come with the right scaling factors. But, of course, I would be lying if I said that it scales from here to eternity. In general, I believe that we will be able to do very well against what AMD will be able to do. I want everybody to go from a frequency world to a number-of-cores-world. But especially in the client space, we have to be very careful with overloading the market with a number of cores and see what is useful."
I frequently run as many as 8 programs at a time, sometimes more, but I seriously doubt each program would know what to do with its own core. With my two-CPU set-up, I find RAM to be almost the biggest limiting factor (although with 2GB, I've never actually run out). There's really no need for 8 cores until my brain is able to take multitasking to the next level, doing many many complex tasks that would gain benefit from (essentially) unlimited CPU power for each program.
They say the biggest bottleneck of any modern computer is its user...
-mrxak
Onions Will Kill You
I recently read about a 1024-core chip for small devices like cell phones Each core ran on a simplified instruction set and specialized in a certain task like muting the microphone when incoming sounds are too quiet, smoothing text on the low resolution screen, and other minute tasks. Individual cores could be placed in low power sleep mode until the software dictated a need for that instruction set.
Is it possible to couple CISC and RISC cores on one die? Is this how the math coprocessors of the 386 era worked? This sounds like an ideal solution to me since nobody needs 4 or 8 cores to be fully powered and ready to pounce at all times.
See, here's where I have to disagree.
Imagine an RPG that has multiples (100's) of 'computer' competitors that are "developing" along the same lines as you and your character(s). Or perhaps an MMORPG with thousands of players, competing against 100's of thousands of virtual characters that are developing along the same lines as your and the mmorpg's characters. Say goodbye to random encounters with stale NPC's - and hello to enemies with unique names and playing styles - all due to the computer's ability to handle such incredible virtualization.
Adding more RAM and a minor increase in speed wouldn't help in either of these scenarios. Bring on the cores, man, and don't stop at 8...
Interestingly enough as you recompile for 64 bits, you also need more memory as you get more memory. Now your memory alignment is now 8 bytes not 4 bytes, and your pointers are much later.
I'd like to take a moment to rail against most commonly accepted forms of parallel education. I'm sure you were taught about threads, critical sections, semaphores, shared memory, etc.
These are all inherently dangerous and difficult to program concepts. Write some application that is flexible and can run with N threads - usually this is hard, the best solution from Java-land is the concurrency toolkit which defines units of work which can be parallelized by a thread pool.
However, there _is_ another way. "CSP" - communicating sequential programs. This is a method of writing naturally parallel systems that do not have the disadvantages of all of the above. (Standard concurrency debugging suggestion in java: "make the method synchronized") A practical example of this is the programming language Erlang. Ericsson invented this language to write high performance telco gear. Their ATM switch line is written in it. In Erlang you have many many 'processes' (not traditional OS processes, but defined in the VM) which cannot share memory - the only way they can communicate is via async messages. You can build a synchronous call on top of async messages pretty trivially (after all, all syncronous network protocols are based on IP which is asynchronous). You never have to worry about memory stomps, or critical sections. You _do_ have to design your applications differently, but it is most definitely worth it.
Another interesting thing about this is your applications naturally parallelize. The "R11" release was just put out, which included SMP support. The previous versions would only use 1 CPU, but this version will use all your CPUs, which means if you have multiple processes ready to run, they'll run on as many CPUs you have! Instant SMP support, no redesign, no RECOMPILE necessary.
This kind of language technology is what is necessary to get us to the next level. A similar thing is possible with Functional languages such as OCaml, Haskell, etc.
I've been working in the industry for 5 years and I'm currently working on a Erlang project. My company was fairly conservative in terms of languages, there was a standing order (until about 2000) "no C++".
Uh, all your examples are only serial in implementation, not serial in nature.
A webpage, for example, need not be parsed serial, though the performance of current systems is high enough that you get nothing in attempting to parallelize the renderer. A printer, however, can trivially be designed to be parallel, especially if you have unusually high DPI. Think of a printer rendering to a paper in the same way that a graphics card renders to a framebuffer. If you can use multiple pipelines, GPUs, and cards to accelerate video display, why wouldn't the same be possible for printing? The neat thing about printers and printed data is that there is no dependence, the image in the upper right exists independent of the image on the lower right, and etc etc. In theory you could have a core assigned to every PIXEL printed on a page, and a corresponding printhead with a printhead for each core, and you would be able to print an entire page in a cingle CPU cycle. Technically.
So there are plenty of other things that could be executed on multiple cores:
Decoding video (playback)
Encoding video (storage, rendering, chat)
AI for games (imagine simulating a multitasking AI on multiple cores)
Physics for games (uncoupled events can be processed independently and coupled events require access to the same data)
Yes, everything has a serial bottleneck, such as data access, but once properly set up most things can also be set up to be multicore as well. Saving a file, for example, can be multicore if you imagine the write as happening all at once, rather than serially, with each core assigned to a write head, each write head then operating independently... Etc.
GPL Deconstructed
Actually, I've been discussing this with a friend recently.
Take NWN for instance. How about making a game where things are REALLY happening? So far most worlds are extremely static. MMORPGs are static in that nothing ever changes, you kill the Lord of Evil and he's back on his dark throne 5 minutes later. And in most RPGs things just stay there and wait for you to appear (say, you never miss a battle in progress, as they just stay there until you appear nearby so that you can conveniently join the battle).
For example, in NWN it's very clear that there are multiple factions living in the area. How about having kobolds, knolls, wolves, etc move around on their own, gather food, kill each other, reproduce, try to invade, etc? Wouldn't it be neat if you could defeat the gnolls, then wander off for whatever reason, and when you return find the kobolds now took over the gnoll cave, increased their population, and Tymofarrar got out of the cave and set fire to the town?
Of course, make it too realistic and it gets a bit weird... imagine having to kill kobold children and walking on gnolls having sex.
The problem we are running into is that webpages were designed to be parsed serially; if they were designed to be parsed in parallel, then they would be woefully inefficient being parsed serially, which until now has been the norm. The same with your printer example.
:) No imagine decoding on the fly HD video chat; unlike HD DVD, on the fly encoded video will not have the best encoding/compression/compute values, but rather an average one. No imagine multi-chat, in which four people are talking, and four HD video streams are being decoded at once.
:) My parents, for example, talking to their grand-daughter in HD, is clearly superior to seeing a 640x480 image, which is again clearly superior to 320x240. Multiple cores would allow for a much nicer, cleaner, 30fps 1024x768 video stream, especially if background tasks are occurring.
So imagine a situation where a webpage was DESIGNED to be parsed in parallel. The page hierarchy would be formatted into independent chunks that could be assigned to different threads and cores without first preparsing it. It would be like having an index built into the webpage such that different elements on the page could automatically, without additional effort, be spun off to different cores. A navigation bar, a banner, the main content, a link-box, and a footer, for example, could all be defined in a webpage such that as soon as the render saw that there exists five elements on the page each element is spun off to a different core to be handled.
The same with a printer; if the printing language were designed up front to be parallel, rather than serial, you could see speedups in rendering, though such gains is probably negligible. An image, such as an embedded jpeg in a document, would be split into four, for four cores, and then rendered into the appropriate printing language, which might come in handy when 10 megapixel pictures become common. Imagine a printer with four print heads, now. You could conceivably send four streams of data at once, which again could be fed by four cores (or a single core of course, if it pre-computed the data needed to be sent to the printer).
Decoding video: Uh, take a look at HD... that's pretty hardcore
Encoding video: Imagine now encoding an HD video chat on the fly
AI: I think you misunderstand. One AI enemy which formulates, simultaneously, five DIFFERENT responses from the same data structures... in other words, an AI of split mind. In the same way I can imagine writing four different responses to you, but only acting on one of them, an AI with multiple responses, but only a single action, becomes much richer, more unpredictable, and unbelievably more complex.
Physics: Physics is really a generic superset of graphics. Graphics is merely how light interacts with the data structures. Throw in gravity, sound, friction, and mass, and you have physics. The same reason why graphics can use multiple cores, then physics can too. Imagine if the 3d sound effects were split among two CPUs, just like frames are? Sound can be trivially represented as frames, much like graphics. Imagine the same with gravity being calculated by two CPUs every other frame, or friction, etc. You can calculate, for example, the spray pattern of a shotgun in time; the trajectory is known, the number of pellets are known, and the environment is known. Right now we approximate the intersection of a shotgun blast with the intersection of a player or a structure, but with additional compute resources you can actually trace each pellet individually!
The same with falling rocks, a flooding room, etc.
Most problems ARE parallelizable, I think, the only real question is approaching the problem from the onset with multiple cores in mind.
GPL Deconstructed