Intel Says to Prepare For "Thousands of Cores"

The thing's hollow - it goes on forever by stoolpigeon · 2008-07-02 08:42 · Score: 5, Funny

- and - oh my God - it's full of cores!

--
It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?

Re:The thing's hollow - it goes on forever by sconeu · 2008-07-02 10:05 · Score: 4, Funny

No, not quite. It's CORES all the way down!

--
General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
Re:The thing's hollow - it goes on forever by kdemetter · 2008-07-02 10:26 · Score: 3, Informative

2001 : A Space Odyssey , by Arthur C. Clarke.
Great book.

--
Slipping shoelaces ?
Re:The thing's hollow - it goes on forever by Maxo-Texas · 2008-07-02 10:49 · Score: 5, Funny

Don't give up! Stay the cores!

--
She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
Re:The thing's hollow - it goes on forever by joto · 2008-07-02 11:11 · Score: 4, Funny

You know, before they made it into a book, it was a perfectly good movie.
Re:The thing's hollow - it goes on forever by Joren · 2008-07-02 11:53 · Score: 2, Informative

The "Control" meme is from Get Smart, which came out a week or two ago. So yes, it is pretty recent...unless you happen to have watched the series from the 60s.

--
-- Joren
Re:The thing's hollow - it goes on forever by dryeo · 2008-07-02 17:00 · Score: 5, Informative

And before they made it into a movie it was an interesting short story. http://en.wikipedia.org/wiki/The_Sentinel_(short_story)
If you'd like to read it, seems it is this PDF, http://econtent.typepad.com/TheSentinel.pdf

--
https://en.wikipedia.org/wiki/Inverted_totalitarianism

Not Sure I'm Getting It by gbulmash · 2008-07-02 08:44 · Score: 5, Insightful

I'm no software engineer, but it seems like a lot of the issue in designing for multiple cores is being able to turn large tasks into many independent discrete operations that can be processed in tandem. But it seems that some tasks lend themselves to that compartmentalization and some don't. If you have 1,000 half-gigahertz cores running a 3D simulation, you may be able to get 875 FPS out of Doom X at 1920x1440, but what about the processes that are slow and plodding and sequential? How do those get sped up if you're opting for more cores instead of more cycles?

--
Start a happiness pandemic

Re:Not Sure I'm Getting It by Delwin · 2008-07-02 08:46 · Score: 5, Informative

Because each core is no longer task switching. Once you have more cores than tasks you can remove all the context switching logic and optimize the cores to run single processes as fast as possible.

Then you take the tasks that can be broken up over multiple cores (Ray Tracing anyone?) and fill the rest of your cores with that.
Re:Not Sure I'm Getting It by Mordok-DestroyerOfWo · 2008-07-02 08:47 · Score: 5, Funny

My friends and I have lots of conversations about girls, how to get girls, how to please girls. However until anything other than idle talk actually happens this goes into the "wouldn't it be nice" category

--
"Never let your sense of morals prevent you from doing what is right" - Salvor Hardin
Re:Not Sure I'm Getting It by zappepcs · 2008-07-02 08:54 · Score: 3, Interesting

IANACS, but if your program structure changes a bit, you can process the two different styles of instructions in different ways, such that when the data needed from or to some sequential group of tasks is needed it is already there, sort of like doing things 6 steps ahead of yourself when possible. I know that makes no sense on the face of it, but at the machine code basics of it, by parsing instructions this way, 5 or 6 operations from now you will need register X loaded with byte 121 from location xyz, so while this core plods through the next few instructions, core this.plus.one prefetches the data at memory location xyz to register X.... or something like that. That will break the serialization of the code. There are other techniques as well, and if written for multicore machines, the program machine code can be executed this way without interpretation by the machine/OS.
There are more than one type of CPU architectures, and principles of execution vary between them. Same for RISC CISC. I think it is likely that the smaller the instruction set for the CPU, the more likely that serialized tasks can be shared out among cores.

--
Support NYCountryLawyer RIAA vs People
Re:Not Sure I'm Getting It by CDMA_Demo · 2008-07-02 09:03 · Score: 4, Funny

My friends and I have lots of conversations about girls, how to get girls, how to please girls.
What, haven't you guys heard of simulation?
Re:Not Sure I'm Getting It by zarr · 2008-07-02 09:04 · Score: 2, Informative

How do those get sped up if you're opting for more cores instead of more cycles?
Algorithms that can't be parallelized will not benefit from a parallel architecture. It's really that simple. :( Also, many algorithms that are parallelizable will not benefit from an "infinite" number of cores. The limited bandwith for communication between cores will usually become a bottleneck at some point.
Re:Not Sure I'm Getting It by Talennor · 2008-07-02 09:05 · Score: 4, Interesting

While prefetching data can be done using a single core, your post in this context gives me a cool idea.
Who needs branch prediction when you could just have 2 cores running a thread? Send each one executing instructions without a break in the pipeline and sync the wrong core to the correct one once you know the result. You'd still have to wait for results before any store operations, but you should probably know the branch result by then anyway.

--

//TODO: signature
Re:Not Sure I'm Getting It by ViperOrel · 2008-07-02 09:05 · Score: 3, Insightful

Just a thought, but I would say that 3 billion operations should be enough for just about any linear logic you could need solved. Where we run into trouble is in trying to use single processes to solve problems that should be solved in parallel. If having a thousand cores means that we can now run things much more efficiently in parallel, then maybe people will finally start breaking their problems up that way. As long as you can only count the cores up on one hand, your potential benefit from multithreading your problem is low compared to the effort of debugging. Once you have a lot of cores, the benefit increases significantly. (I see this helping a lot in image processing, patern recognition, and natural language... not to mention robotics and general AI...)
Re:Not Sure I'm Getting It by pla · 2008-07-02 09:10 · Score: 5, Insightful

I'm no software engineer [...] but what about the processes that are slow and plodding and sequential? How do those get sped up if you're opting for more cores instead of more cycles?

As a software engineer, I wonder the same thing.

Put simply, the majority of code simply doesn't parallelize well. You can break out a few major portions of it to run as their own threads, but for the most part, programs either sit around and wait for the user, or sit around and wait for hardware resources.

Within that, only those programs that wait for a particular hardware resource - CPU time - Even have the potential to benefit from more cores... And while a lot of those might split well into a few threads, most will not scale (without a complete rewrite to chose entirely different algorithms - If they even exist to accomplish the intended purpose) to more than a handful of cores.
Re:Not Sure I'm Getting It by zappepcs · 2008-07-02 09:18 · Score: 3, Interesting

Indeed, and any tasks that are flagged as repeating can be repeated on a separate core from cores executing serial instructions such that IPC allows things that happen serially to happen coincident with each other. A simple high level example is reading the configuration for your process that may change at any time during your process due to outside influences. Let the reading of that happen out of band on the processing as it is not part of the sequential string of instructions for executing your code. That way config data is always correct without your serially oriented code needing to stop to check anything other than say $window.size=? such that it's value is always updated by a different core.
Sorry if that is not a clear explanation. I just mean to say that since most of what we do is serially oriented, it's difficult to see how at the microscopic level of the code, it can be broken up to parallel tasks. A 16% decrease in processing time is significant. Building OS and compilers to optimize this would improve execution times greatly, just as threading does today. If threads are written correctly to work with multiple cores, it's possible to see significant time improvements there also.

--
Support NYCountryLawyer RIAA vs People
Re:Not Sure I'm Getting It by mweather · 2008-07-02 09:19 · Score: 4, Insightful

Pleasing a woman is easy. Give her your credit card.
Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 09:21 · Score: 5, Insightful

That is what most current processors do and use branch prediction for. Even if you have a thousand cores, that's only 10 binary decisions ahead. You need to guess really well very often to keep your cores busy instead of syncing. Also, the further you're executing ahead, the more ultimately useless calculations are made, which is what drives power consumption up in long pipeline cores (which you're essentially proposing).
In reality parallelism is more likely going to be found by better compilers. Programmers will have to be more specific about the type of loops they want. Do you just need something to be performed on every item in an array or is order important? No more mindless for-loops for not inherently sequential processes.
Re:Not Sure I'm Getting It by sexconker · 2008-07-02 09:21 · Score: 2, Interesting

So instead of a pipeline you have a tree.
Great, except for the fact that it's incredibly inefficient and the performance gain is negligible.
Quantum computers will (in theory) allow us to do both at once.
Re:Not Sure I'm Getting It by jandrese · 2008-07-02 09:28 · Score: 4, Insightful

Process switching overhead is pretty low though, especially if you just have one thread hammering away and most everything else is largely idle. The fundamental limitation of being stuck with 1/1000 of the power of your 1000 core chip because your problem is difficult/impossible to parallelize is a real one.

From a practical standpoint, Intel is right that we need vastly better developer tools and that most things that require ridiculous amounts of compute time can be parallized if you put some effort into it.

--

I read the internet for the articles.
Re:Not Sure I'm Getting It by 192939495969798999 · 2008-07-02 09:33 · Score: 5, Insightful

I concur, furthermore I'd like to see one core per pixel, that would certainly solve your high-end gaming issues.

--
stuff |
Re:Not Sure I'm Getting It by Intron · 2008-07-02 09:35 · Score: 4, Insightful

I wonder who has the rights to all of the code from Thinking Machines? We are almost to the point where you can have a Connection Machine on your desktop. They did a lot of work on automatically converting code to parallel in the compiler and were quite successful at what they did. Trying to do it manually is the wrong approach. A great deal of CPU time on a modern desktop system is spent on graphics operations, for example. That is all easily parallelized.

--
Intron: the portion of DNA which expresses nothing useful.
Re:Not Sure I'm Getting It by Brian+Gordon · 2008-07-02 09:43 · Score: 3, Informative

Are you crazy? Context switches are the slowdown in multitasking OSes.
Re:Not Sure I'm Getting It by mikael_j · 2008-07-02 09:44 · Score: 3, Insightful

Obviously just adding more cores does little to speed up individual sequential processes, but it does help with multitasking, which is what I really think is the "killer app" for multi-core processors.
Back in the late 90's (it doesn't feel like "back in.." yet but I'm willing to admit that it was about a decade ago) I decided to build a computer with an Abit BP6 motherboard, two Celeron processors and lots of RAM instead of a single higher end processor because I wanted to be able to multitask properly, my gamer friends mocked me for choosing Celeron processors but for the price of a single processor system I got a system that was capable of running several "normal" apps and one with heavy cpu usage without slowing down the system, and the extra RAM also helped (I saw lots of people back then go for 128 MB of RAM and a faster CPU instead of "wasting" their money on RAM, and then they cursed their computer for being slow when it started swapping). There was also the upside of having Windows 2000 run as fast on my computer as Windows 98 did on my friends' computers...
/Mikael

--
Greylisting is to SMTP as NAT is to IPv4
Re:Not Sure I'm Getting It by hey! · 2008-07-02 09:53 · Score: 3, Insightful

Are you crazy? Context switches are the slowdown in multitasking OSes.
Unfortunately, multitasking OSes are not the slowdown in most tasks, exceptions noted of course.

--
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
Re:Not Sure I'm Getting It by k8to · 2008-07-02 09:56 · Score: 5, Informative

True but misleading. The major cost of task switching is a hardware-derived one. It's the cost of blowing caches. The swapping of CPU state and such is fairly small by comparison, and the cost of blowing caches is only going up.

--
-josh
Re:Not Sure I'm Getting It by jonbryce · 2008-07-02 09:57 · Score: 4, Insightful

At the moment, I'm looking at Slashdot in Firefox, while listening to an mp3. I'm only using two out of my four cores, and I have 3% CPU usage.
Maybe when I post this, I might use a third core for a little while, but how many cores can I actually usefully use.
I can break a password protected Excel file in 30 hours max with this computer, and a 10000 core chip might reduce this to 43 seconds, but other than that, what difference is it going to make?
Re:Not Sure I'm Getting It by k8to · 2008-07-02 09:58 · Score: 3, Interesting

Of course, the billion threads design doesn't solve the "how do n cores efficiently share x amount of cache" problem at all.

--
-josh
Re:Not Sure I'm Getting It by hedwards · 2008-07-02 10:13 · Score: 3, Interesting

That's what I'm curious about. Having 2 cores is enough for most consumers, one for the OS and background tasks and one for the application you're using. And that's overkill for most users.
Personally, I like to multi task and am really going to love when we get to the point where I can have the OS on one core and then have 1 core for each of my applications. But even that is limited to probably less than 10 cores.
Certain types of tasks just don't benefit from extra cores, and probably never will. Things which have to be done sequentially are just not going to see any improvement with extra cores. And other things like compiling software may or may not see much of an improvement depending upon the design of the source.
But really, it's mainly things like raytracing and servers with many parallel connections which are the most likely to benefit. And servers are still bound by bandwidth, probably well before they would be hitting the limit on multi cores anyways.
Re:Not Sure I'm Getting It by Artuir · 2008-07-02 10:13 · Score: 2, Funny

Well, you see.. when posting somewhere like Slashdot that knows nothing about women or girls, anything pertaining to their habits or way of life is insightful and/or informative.
Re:Not Sure I'm Getting It by rrohbeck · 2008-07-02 10:17 · Score: 2, Informative

Yup. Its Amdahl's law.
This whole many core hype looks a lot like the Gigahertz craze from a few years ago. Obviously they're afraid that there will be no reason to upgrade. 2 or 4 cores, ok - you often (sometimes?) have that many tasks active. But significantly more will only buy you throughput for games, simulations and similar heavy computations. Unless we (IAACS too) rewrite all of our apps under new paradigms like functional programming (e.g. in Erlang.) Which will only be done if there's a good reason for it.

--
thegodmovie.com - watch it
Re:Not Sure I'm Getting It by cpeterso · 2008-07-02 10:29 · Score: 5, Interesting

Now that 64-bit processors are so common, perhaps operating systems can spare some virtual address space for performance benefits.
The OPAL operating system was a University of Washington research project from the 1990s. OPAL uses a single address space for all processes. Unlike Windows 3.1, OPAL still has memory protection and every process (or "protection domain") has its own pages. The benefit of sharing a single address space is that you don't need to flush the cache (because the virtual-to-physical address mapping do not change when you context switch). Also, pointers can be shared between processes because their addresses are globally unique.

--
cpeterso
Re:Not Sure I'm Getting It by frission · 2008-07-02 10:33 · Score: 2, Interesting

maybe in some language "for" loops will be meant to be processed sequentially, and "for each" can be parallelized?
Re:Not Sure I'm Getting It by painehope · 2008-07-02 10:36 · Score: 5, Funny

They have been simulating it, that's why he said "My friends and I". *shudders*

--
PC moderators can suck my White pierced, tattooed dick. If you think pride == hate, s/dick/Aryan meat mallet/g.
Re:Not Sure I'm Getting It by LandDolphin · 2008-07-02 10:41 · Score: 5, Insightful

"Having 2 cores is enough for most consumers"

Before having 1 core was enough, and having 512mb of RAM was enough for most consumers. Computing power grows, and software developers makes use of that additional power. However, this will mainly effect the gaming industry.

--
Spelling and Grammar errors have been added to this post for your enjoyment
Re:Not Sure I'm Getting It by blahplusplus · 2008-07-02 10:50 · Score: 4, Informative

"Because each core is no longer task switching. Once you have more cores than tasks you can remove all the context switching logic and optimize the cores to run single processes as fast as possible.
Then you take the tasks that can be broken up over multiple cores (Ray Tracing anyone?) and fill the rest of your cores with that."
Unfortunately all this is going to lead to bus and memory bandwidth contention, you're just shifting the burden from one point to another. Although their is a 'penalty' for task switching, there is an even greater bottleneck at the bus and memory bandwidth level.
IMHO intel would have to release a cpu on a card with specialized ram chips and segment the ram like GPU's do to get anything out of multicore over the long term, ram is not keeping up and the current architecture for PC ram is awful for multicore. CPU speed is far outstripping bus and memory bandwidth. I am quite dubious of multi-core architecture, there is fundamental limits of geometry of circuits. I'd be sinking my money into materials research not glueing cores together and praying CS and math guys come up with solutions that take advantage of it.
The whole of human history of engineering and tool use, is to take something extremely complicated and offload complexity, and compartmentalize it so that it's mangable. I see the opposite happening with multi-core.
Re:Not Sure I'm Getting It by skulgnome · 2008-07-02 10:57 · Score: 5, Informative

No. I/O is the slowdown in multitasking OSes.
Re:Not Sure I'm Getting It by Gilmoure · 2008-07-02 10:59 · Score: 2, Funny

I think I once figured out that, starting with 3 billion women on the planet, there were about 5 with mutual attraction with me. I think I've found two of them.

--
I drank what? -- Socrates
Re:Not Sure I'm Getting It by ceswiedler · 2008-07-02 11:07 · Score: 3, Insightful

Uh, last time I checked, Python had a single interpreter lock per process which made it unsuitable for heavily multithreaded programs. Java would be a better example of a scalable and multithread-aware language.
Re:Not Sure I'm Getting It by kv9 · 2008-07-02 11:08 · Score: 5, Funny

I can break a password protected Excel file in 30 hours max with this computer, and a 10000 core chip might reduce this to 43 seconds, but other than that, what difference is it going to make?
29 hours 59 minutes 17 seconds?

--
Stop Computers/Cars Analogies on S
Re:Not Sure I'm Getting It by joto · 2008-07-02 11:34 · Score: 2, Insightful

In reality parallelism is more likely going to be found by better compilers. Programmers will have to be more specific about the type of loops they want. Do you just need something to be performed on every item in an array or is order important? No more mindless for-loops for not inherently sequential processes.
I disagree. Having the compiler analyze loops to find out if they are trivially parallelizable is easy, there's little need to change the language.
On the other hand, a language that was really designed for kilocores or megacores would be radically different from most modern languages, adding a few extra (un)loop-statements wouldn't do. Functional languages are a good bet. When everything is side-effect-free, there's no good reason why all of it can't be executed in parallel.
But maybe we need even more abstraction. And more time. It took quite a while after the invention of the programmable computer for someone to invent FORTRAN. And we still program in something resembling FORTRAN. Maybe what we really need are actual many-core computers so that someone really smart will use them, and finally figure out a way to program them that's practical. That's where I'll put my money. Wait and see!
Re:Not Sure I'm Getting It by curunir · 2008-07-02 11:46 · Score: 4, Insightful

...but other than that, what difference is it going to make?
This is, IMHO, the wrong question to be asking. Asking how current tasks will be optimized to take advantage of future hardware makes the fundamental flawed assumption that the current tasks will be what's considered important once we have this kind of hardware.
But the history of computers have shown that the "if you build it, they will come" philosophy applies to the tasks that people end up wanting to accomplish. It's been seen time and again that new abilities for using computers wait until we've hit a certain performance threshold, whether it CPU, memory, bandwidth, disk space, video resolution or whatever, and then become the things we need our computers to do.
Take, for instance, the huge success of mp3's. There was a time not so long ago when people were limited to playing music off a physical CD. This wasn't because there was no desire amongst computer users to listen to digital files that could be stored locally or streamed off the internet. It was because computer users did not know yet that they had the desire to do it. But technology advanced to the point where a) processors became fast enough to decode mp3's in real time without using the whole CPU and b) hard drives grew to the point where we had the capacity to store files that are 10% of the size of the size of the files on the CD.
Similarly, it's likely that when we reach the point where we have hundreds or thousands of cores, new tasks will emerge that take advantage of the new capabilities of the hardware. It may be that those tasks are limited in some other way by one of the other components we use or by the as yet non-existent status of some new component, but it's only important that multiple cores play a part in enabling the new task.
In the near term, you can imagine a whole host of applications that would become possible when you get to the point where the average computer can do real-time H.264 encoding without affecting overall system performance. I won't guess at what might be popular further down the road, but there will be people who will think of something to do with those extra cores. And, in hindsight, we'll see the proliferation of cores as enabling our current computer-using behavior.

--
"Don't blame me, I voted for Kodos!"
Re:Not Sure I'm Getting It by geekoid · 2008-07-02 11:53 · Score: 4, Insightful

Why wouldn't each core have it's own cache? It only needs to cache what it needs for its job.

--
The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
Re:Not Sure I'm Getting It by geekoid · 2008-07-02 12:07 · Score: 2, Insightful

"Unfortunately all this is going to lead to bus and memory bandwidth contention, "
Good. Current bus needs to be redone.

--
The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
Re:Not Sure I'm Getting It by Cynic.AU · 2008-07-02 12:12 · Score: 2, Interesting

Holy crap. I just realised what you were saying -- use parallelism, vast parallelism for BRANCH PREDICTION.
That's not really how concurrency works at the moment :) it's at a much higher level at the moment, explicitly in the code itself - take matrix multiplication for instance, it's easy to see how that can be split up into multiple threads..
But calculation of every possible state 'n' states into the future, with 2^n CPU cores, that sounds like a good idea, sir! :) and is also not mutually exclusive with explicit multithreading (although each concurrent thread blows out the total number of states).
Re:Not Sure I'm Getting It by geekoid · 2008-07-02 12:15 · Score: 2, Insightful

except when running an algorithm on 1 core, you can have 900 cores running different outputs based on the probability of a different out come of the previous part of the process.
WHen it is actually determined, kill the 899 that wher incorrect. In fact, what would probably happen is they would all branch differently, so you might kill 400, then after running for a bit, 200, and so on. This would exponentially decrease the time it takes to solve it.
In fact, for some application getting 'close enough' will do.
Example:
Chess. I move my pawn in the first move in chess. 18 processes started up on separate cores, each one calculating the next 5 steps that are possible. When the next mover is made, it kills the processes that didn't calculate 5 steps from that move.

--
The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
Re:Not Sure I'm Getting It by kramerd · 2008-07-02 12:23 · Score: 3, Informative

Girls like it when you buy them things. Or when you pretend to listen. And when you shower.
Re:Not Sure I'm Getting It by kesuki · 2008-07-02 12:35 · Score: 5, Interesting

yes, but if you have 1000 cores each with 64k of cache, then you start to run into problems with memory throughput when computing massively parallel data.
memory throughput has been the achilles heel of graphic processing for years now. and as we all know, splitting up a graphic screen into smaller segments is simple. so GPUs went massively parallel long before CPUS, in fact you will soon be able to get over 1000 stream processing units in a single desktop graphic card.
so, the real problem is memory technology, how can a single memory module consistently feed 1000 cores, for instance if you want to do real-time n-pass encoding of a hd video stream... while playing a FPS online, and running IM software, and a strong anti-virus suite...
I have a horrible horrible ugly feeling that you'll never be able to get a system that can reliably do all that. at the same time, just because they'll skimp on memory tech or interconnects, so you'll have most of the capabilities of a 1,000 core system wasted.

--
https://www.gnu.org/philosophy/free-sw.html
Re:Not Sure I'm Getting It by kesuki · 2008-07-02 12:46 · Score: 3, Informative

"Take, for instance, the huge success of mp3's. There was a time not so long ago when people were limited to playing music off a physical CD. This wasn't because there was no desire amongst computer users to listen to digital files that could be stored locally or streamed off the internet. It was because computer users did not know yet that they had the desire to do it. But technology advanced to the point where a) processors became fast enough to decode mp3's in real time without using the whole CPU"
I started making mp3s with a 486 DX 75mhz
I could decode in real time on a 486 DX 75 as i recall encoding took a bit of time, and i only had a 3 GB HDD that had been an upgrade to the system...
Mp3s use a asynchronous encoding algorithm, more CPU to encode, than to decode, if your MP3 player doesn't run correctly on a 486, then it's because they designed in features not strictly needed to decode a MP3 stream.
Oh hey, I have an RCA Lyra mp3 player, that isn't even as fast as a 486, but the decoder was designed for mp3 decoding.
Ogg decoding uses a beefier decoder, that's half the problem getting ogg support in devices not made for decoding video streams.

--
https://www.gnu.org/philosophy/free-sw.html
Re:Not Sure I'm Getting It by Salamander · 2008-07-02 13:23 · Score: 5, Informative

Because each core is no longer task switching. Once you have more cores than tasks you can remove all the context switching logic and optimize the cores to run single processes as fast as possible.
OK, so now the piece that's running on each core runs really really fast . . . until it needs to wait for or communicate with the piece running on some other core. If you can do your piece in ten instructions but you have to wait 1000 for the next input to come in, whether it's because your neighbor is slow or because the pipe between you is, then you'll be sitting and spinning 99% of the time. Unfortunately, the set of programs that decompose nicely into arbitrarily many pieces that each take the same time (for any input) doesn't extend all that far beyond graphics and a few kinds of simulation. Many, many more programs hardly decompose at all, or still have severe imbalances and bottlenecks, so the "slow neighbor" problem is very real.
Many people's answer to the "slow pipe" problem, on the other hand, is to do away with the pipes altogether and have the cores communicate via shared memory. Well, guess what? The industry has already been there and done that. Multiple processing units sharing a single memory space used to be called SMP, and it was implemented with multiple physical processors on separate boards. Now it's all on one die, but the fundamental problem remains the same. Cache-line thrashing and memory-bandwidth contention are already rearing their ugly heads again even at N=4. They'll become totally unmanageable somewhere around N=64, just like the old days and for the same reasons. People who lived through the last round learned from the experience, which is why all of the biggest systems nowadays are massively parallel non-shared-memory cluster architectures.
If you want to harness the power of 1000 processors, you have to keep them from killing each other, and they'll kill each other without even meaning to if they're all tossed in one big pool. Giving each processor (or at least each small group of processors) its own memory with its own path to it, and fast but explicit communication with its neighbors, has so far worked a lot better except in a very few specialized and constrained cases. Then you need multi-processing on the nodes, to deal with the processing imbalances. Whether the nodes are connected via InfiniBand or an integrated interconnect or a common die, the architectural principles are likely to remain the same.
Disclosure: I work for a company that makes the sort of systems I've just described (at the "integrated interconnect" design point). I don't say what I do because I work there; I work there because of what I believe.

--
Slashdot - News for Herds. Stuff that Splatters.
Re:Not Sure I'm Getting It by earthforce_1 · 2008-07-02 13:52 · Score: 2, Insightful

You speed it up by rewriting sequential algorithms to run in parallel. It is surprising the number of algorithms you would swear are inherently sequential that can be rewritten to operate in parallel. Beyond that, you can have cores engaged in speculative execution, where the results may or may not be used. I could imaging a spell checker where multiple words and sentence fragments are dispatched to numerous cores for spelling/grammar checking. A compiler could devote a separate core to compiling/linking/optimizing each individual module or function.
Programmers don't think massively parallel and most programming languages (excluding hardware design languages such as Verilog/VHDL) are sequential in nature.

--
My rights don't need management.
Re:Not Sure I'm Getting It by Gazzonyx · 2008-07-02 14:01 · Score: 2, Insightful

Another thing to think about (besides cache coherency, ping ponging between sockets over the bus, locking overhead, etc.): You can have a million cores and it won't matter. You're only as fast as your weakest link. Right now, that's storage, but solid state hard drives will be common in the next decade for first tier storage (as straight memory bank storage becomes more common for high performance applications), the average disk access time will improve by a few orders of magnitude. Still, that only moves the problem 'forward' a level.

You still choke on the Memory Wall; you have to feed all those cores data, and you're going a few orders of magnitude slower than the CPU cores. Increasing bandwidth on the front side bus doesn't help, as you have to increase bandwidth and decrease latency. You compound this when you have many cores/sockets doing backward cache flushes to RAM.

Even if you've got a hypertransport link (as Intel doesn't, they push bits on the front side bus between sockets, IIRC) to the north bridge for each socket, you've still only got a single north bridge. You're bottlenecked again. OK, use two front side buses with an interlink. Now we're back to coherency problems, but at two points. At some point, you have to either give each socket its own RAM bank (NUMA) and isolate data (and make CPU migration for tasks take an extra hit) or figure out how to perfectly isolate and stripe your data over multiple paths to a single backing store.

--
If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
Re:Not Sure I'm Getting It by Erich · 2008-07-02 15:17 · Score: 5, Informative
Single Address Space is horrible.
It's a huge kludge for idiotic processors (like arm9) that don't have physically-tagged caches. On all non-incredibly-sucky processors, we have physically tagged caches, and so having every app have its own address space, or having multiple apps share physical pages at different virtual addresses, all of these are fine.
Problems with SAS:
- Everything has to be compiled Position-independent, or pre-linked for a specific location
- Virtual memory fragmentation as applications are loaded and unloaded
- Where is the heap? Is there one? Or one per process?
- COW and paging get harder
- People start using it and think it's a good idea.
Most people... even people using ARM... are using processors with physically-tagged caches. Please, Please, Please, don't further the madness of single-address-space environments. There are still people encouraging this crime against humanity.
Maybe I'm a bit bitter, because some folks in my company have drunk the SAS kool-aid. But believe me, unless you have ARM9, it's not worth it!
--
-- Erich
Slashdot reader since 1997
Re:Not Sure I'm Getting It by poopdeville · 2008-07-02 17:08 · Score: 2, Interesting

Many core could allow for slower clock speeds, cooler chips and quite computers.
Of course, An OS could be designed so different modular componts run on different cores.
More is possible if you have thousands of cores. A machine with thousands of cores could conceivably pre-compute the possible computational consequences of your in-a-standard-deviation-most-likely actions, based on a genetic learning algorithm to figure out what you do when. In a sense, the more predictable you are, the faster it would get. Imagine an iPhone that does that!

--
After all, I am strangely colored.
Re:Not Sure I'm Getting It by Stan+Vassilev · 2008-07-02 17:32 · Score: 4, Insightful

As a software engineer, I wonder the same thing.
Put simply, the majority of code simply doesn't parallelize well. You can break out a few major portions of it to run as their own threads, but for the most part, programs either sit around and wait for the user, or sit around and wait for hardware resources.
Within that, only those programs that wait for a particular hardware resource - CPU time - Even have the potential to benefit from more cores... And while a lot of those might split well into a few threads, most will not scale (without a complete rewrite to chose entirely different algorithms - If they even exist to accomplish the intended purpose) to more than a handful of cores.
As a software engineer you should know that "most code doesn't parallelize" is very different from "most of the code's runtime can't parallelize", as code size and code runtime are substantially different things.
Look at most CPU intensive tasks today and you'll notice they all parallelize very well: archiving/extracting, encoding/decoding (video, audio), 2D and 3D GUI/graphics/animations rendering (not just for games anymore!), indexing and searching indexes, databases in general, and last but not least, image/video and voice recognition.
So, while your very high-level task is sequential, the *services* it calls or implicitly uses (like GUI rendering), and the smaller tasks it performs, actually would make a pretty good use of as many cores as you can throw at them.
This is good news for software engineers like you and me, as we can write mostly serial code and isolate slow tasks into isolated routines that we write once and reuse many times.
Re:Not Sure I'm Getting It by makapuf · 2008-07-02 19:54 · Score: 2, Insightful

Why "before" ? I think 512Mb RAM / 1 or 2 GHz + decent speedy harddrive IS enough for most consumers, playing (moderately recent) games (maybe upgrading to a newer $50 video card), playing (moderate) HD, MP3, browsing sites, any office work usings lots of ajax/ on FF3.
You know what ? you could even (gasp) code on it (maybe not compile eclipse every 5 minutes, OK), run a small server on it, or transcoding videos (maybe 4x more slowly, so you'll end up letting it run for the night instead of 2 hours from time to time. big deal)
Of course, SOME people might need more. For most of us, 512Mb/1x2GHz is perfectly enough (see eeePC).
Re:Not Sure I'm Getting It by StatusWoe · 2008-07-02 23:56 · Score: 2, Interesting

Why do all the processors have to be the same? why not have a x-core processor for the smaller tasks that are easily parallizable and have a high-cycle processor for the ones that aren't? Same might be done for cache requirements?

--
"drink deeply the illusion of your safety"

Great... by Amarok.Org · 2008-07-02 08:44 · Score: 4, Funny

As if Oracle licensing wasn't complicated enough already...

--
-- "Other than that, how was the play Mrs. Lincoln?"

Re:Great... by Penguinisto · 2008-07-02 09:15 · Score: 2, Interesting

...then again, I can see it as an argument for vendors to finally --finally!-- stop counting "processors" as their license limit metric. And yes VMWare, I'm talking to you too when I say that.
/P

--
Quo usque tandem abutere, Nimbus, patientia nostra?

Memory bandwidth? by Brietech · 2008-07-02 08:45 · Score: 5, Interesting

If you can get a thousand cores on a chip, and you still only have enough pins for a handful (at best) of memory interfaces, doesn't memory become a HUGE bottleneck? How do these cores not get starved for data?

--
I'm perfect in every way, except for my humility.

Re:Memory bandwidth? by smaddox · 2008-07-02 08:55 · Score: 2, Interesting

Memory would have to be completely redefined. Currently, you have one memory bank that is effectively accessed serially.
If you have 1000 cores that depend on the same data, you would have to have a way of multicasting the data to the cores, which could then select the data they want.
Basically, hardware and software architecture has to be completely redefined.
It is not impossible, though. Just look around. The universe computes in parallel all the time.
Re:Memory bandwidth? by lazyDog86 · 2008-07-02 08:57 · Score: 2, Insightful

I would assume that if you have enough transistors to have thousands of cores that you will be able to put on a lot of SRAM cache as well - just drop a few hundred or thousand cores. You won't be able to integrate DRAM since it requires a different process, but SRAM should be integrated easily enough.

--
my insights may be modded Funny, but at least some of my jokes are modded Insightful
Re:Memory bandwidth? by tt465857 · 2008-07-02 08:59 · Score: 2, Interesting

3D integration schemes, which IBM and Intel are both pursuing, help deal with this problem. As you noted, you can't put enough pins on a chip with traditional packaging to achieve a sufficient memory bandwidth. But with 3D integration, the memory chips are connected directly to the CPUs with "through-chip vias". You can have tens of thousands of these vias, and as a bonus, the distance to the memory is extremely short, so latency is reduced.
- Trevor -
[[self-construction]]: The autotherapeutic diary of a crazy geek's journey back to mental health
Re:Memory bandwidth? by Gewalt · 2008-07-02 09:12 · Score: 2, Insightful

Not really. If you can put 1000 cores on a processor, then I don't see why you cant put 100 or so layers of ram on there too. Eventually, it will becomea requirement to get the system to scale.

--
Modding Trolls +1 inciteful since 1999
Re:Memory bandwidth? by bluefoxlucid · 2008-07-02 09:33 · Score: 4, Insightful

Memory would have to be completely redefined. Currently, you have one memory bank that is effectively accessed serially.
Yes, in Intel land. AMD has this thing called NUMA. What do you think "HyperTransport" means?

--
Support my political activism on Patreon.
Re:Memory bandwidth? by Anonymous Coward · 2008-07-02 09:35 · Score: 2, Insightful

You need a basic course in TTL. No they haven't figured this out and putting address decoded on the chip makes very little difference when you scale. They also haven't figured out communication between cores. We had 1000s of CPUs rigged up with transputers back in the 80s. It was a mare, and near useless for just about everything. We had to use serial data to make things sane.
The more logic you have the longer the signal path. The longer the signal path the hard it is to sync on the clock pulse. The higher the clock freq the less like a square wave the single is, it starts to look like a ramp.
There are huge problems with scaling, whether it's speed or cores. If Intel want us to have all these cores, their engineers are going to have to overcome the same problems parallel programming has had for 30 year or more.

Disagreement about this trend by Raul654 · 2008-07-02 08:46 · Score: 5, Interesting

At Supercomputing 2006, they had a wonderful panel where they discussed the future of computing in general, and tried to predict what computers (especially Supercomputers) would look like in 2020. Tom Sterling made what I thought was one of the most insightful observations of the panel -- most of the code out there is sequential (or nearly so) and I/O bound. So your home user checking his email, running a web browser, etc is not going to benefit much from having all that compute power. (Gamers are obviously not included in this) Thus, he predicted, processors would max out at a "relatively" low number of cores - 64 was his prediction.

--

To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton

Re:Disagreement about this trend by RailGunSally · 2008-07-02 08:57 · Score: 5, Funny

Sure! 64 cores should be enough for anybody!
Re:Disagreement about this trend by tzhuge · 2008-07-02 09:11 · Score: 3, Funny

Sure, until a killer app like Windows 8 comes along and requires a minimum of 256 cores for email, web browsing and word processing. Interpret 'killer app' how you want in this context.
Re:Disagreement about this trend by RightSaidFred99 · 2008-07-02 09:22 · Score: 4, Insightful

His premise is flawed. People using email, running a web browser, etc... hit CPU speed saturation some time ago. A 500MHz CPU can adequately serve their needs. So they are not at issue here. What's at issue is next generation shit like AI, high quality voice recognition, advanced ray tracing/radiosity/whatever graphics, face/gesture recognition, etc... I don't think anyone sees us needing 1000 cores in the next few years.
My guess is 4 cores in 2008, 4 cores in 2009, moving to 8 cores through 2010. We may move to a new uber-core model once the software catches up, more like 6-8 years than 2-4. I'm positive we won't "max out" at 64 cores, because we're going to hit a per-core speed limit much more quickly than we hit a number-of-cores limit.
Re:Disagreement about this trend by the_olo · 2008-07-02 09:25 · Score: 3, Interesting

So your home user checking his email, running a web browser, etc is not going to benefit much from having all that compute power. (Gamers are obviously not included in this)
You've excluded gamers as if this had been some nearly extinct exotic species. Don't they contribute the most to PC hardware market growth and progress?
Re:Disagreement about this trend by eht · 2008-07-02 09:31 · Score: 2, Interesting

We've pretty much already hit a per-core speed limit, you really can't find many CPU's running over 3GHZ, whereas back in P4 days you'd see them all the way up to 3.8.
Architectures have changed and other stuff allow a current single core of a 3.2 to easily outperform the old 3.8's but then still why don't we see new 3.8's?
Re:Disagreement about this trend by MojoRilla · 2008-07-02 09:34 · Score: 5, Insightful

This seems silly. If you create more compute power, someone will think of ways to use it.

Web applications are becoming more AJAX'y all the time, and they are not sequential at all. Watching a video while another tab checks my Gmail is a parallel task. All indications are that people want to consume more and more media on their computers. Things like the MLB mosaic allow you to watch four games at once.

Have you ever listened to a song through your computer while coding, running an email program, and running an instant messaging program? There are four highly parallelizable tasks right there. Not compute intensive enough for you? Imagine the song compressed with a new codec that is twice as efficient in terms of size but twice as compute intensive. Imagine the email program indexing your email for efficient search, running algorithms to assess the email's importance to you, and virus checking new deliveries. Imagine your code editor doing on the fly analysis of what you are coding, and making suggestions.

"Normal" users are doing more and more with computers as well. Now that fast computers are cheap, people who never edited video or photos are doing it. If you want a significant market besides gamers who need more cores, it is people making videos, especially HD videos. Sure, my Grandmother isn't going to be doing this, but I do, and I'm sure my children will do it even more.

And don't forget about virus writers. They need a few cores to run on as well!

Computer power keeps its steady progress higher, and we keep finding interesting things to do with it all. I don't see that stopping, so I don't see a limit to the number of cores people will need.
Re:Disagreement about this trend by BlueHands · 2008-07-02 09:38 · Score: 2, Insightful

I KNOW it is so very often sited but if every was a time to mention the "5 computers in the whole world" it is this. In fact, I would dare say that is the whole point of this push by Intel: trying to get people (programmers) used to the thought of having so many parallel cpus in a home computer.
Sure, from where we stand now, 64 seems like a lot but maybe a core for nearly each pixel on my screen makes sense, has real value to add. Or how about just flat-out smarter computers, something which might happen by simulating 100 neurons per core. As far as I understand it, speech recognition can always use more power. Let me put it differently:
Games requiring a lot of computing power makes sense to you in the future but not elsewhere. The same would have been said about a high end gaming rig just a handful of years ago, and yeta low-end PC today has amazing graphics,amazing everything, compared to what things were just 10 years ago. And it gets used, much of the time. If we have the power, we will use it. Games just push the envelope further, sooner, but they don't go anywhere that we all wouldn't wouldn't like to go anyways.
I can not think of a single task in a game that I would not want to be able to do in real life. Games are about living an idealized life, of some sort, inside your computer. The next step is bring it our here, to the rest of the world.

--
I mod everyone down who says "I'll get modded down for this." I hate to disappoint.
Re:Disagreement about this trend by drinkypoo · 2008-07-02 09:42 · Score: 5, Interesting

Architectures have changed and other stuff allow a current single core of a 3.2 to easily outperform the old 3.8's but then still why don't we see new 3.8's?
The Pentium 4 is, well, it's scary. It actually has "drive" stages because it takes too long for signals to propagate between functional blocks of the processor. This is just wait time, for the signals to get where they're going.
The P4 needed a super-deep pipeline to hit those kinds of speeds as a result, and so the penalty for branch misprediction was too high.
What MAY bring us higher clock rates again, though, is processors with very high numbers of cores. You can make a processor broad, cheap, or fast, but not all three. Making the processors narrow and simple will allow them to run at high clock rates and making them highly parallel will make up for their lack of individual complexity. The benefit lies in single-tasking performance; one very non-parallelizable thread which doesn't even particularly benefit from superscalar processing could run much faster on an architecture like this than anything we have today, while more parallelizable tasks can still run faster than they do today in spite of the reduced per-core complexity due to the number of cores - if you can figure out how to do more parallelization. Of course, that is not impossible.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:Disagreement about this trend by jsebrech · 2008-07-02 10:29 · Score: 2, Informative

Architectures have changed and other stuff allow a current single core of a 3.2 to easily outperform the old 3.8's but then still why don't we see new 3.8's?
Clock rate is meaningless. They could build a 10 ghz cpu, but it wouldn't outperform the current 3 ghz cpu's.
A modern cpu uses pipelining. This means that each instruction is spread out across a series of phases (e.g. fetch data, perform calculation 1, perform calculation 2, store data). Each phase is basically a layer of transistors the logic has to go through. The clock rate simply is how often data is transferred to the next phase. The higher you push the clock, the faster instructions move through their phases towards completion. The problem is that the transistors in each phase take a while after every clock tick to stabilize. So, if you push the clock rate too high, the end result of your current phase won't have been reached yet, and you'll push garbage to the next phase. This is why a cpu that is overclocked too far will cause crashes. It simply doesn't do reliable calculation anymore.
Now, the reason you had higher clock rates on the P4 architecture is that intel "solved" the clock rate problem by having more phases and making each phase shorter. Overall the cpu was less efficient, but they could put a bigger ghz number on the package, so marketing was happy. They've come back from that because they couldn't compete on cost/performance with someone who didn't do that (amd), and their current architecture has appropriate-length phases again, with a lower clock rate to match.
Like you've observed however, overall the speed has gone up.
Re:Disagreement about this trend by felipekk · 2008-07-02 11:11 · Score: 2, Funny

Ah, I see you are running Vista...
j/k though, I have a single core running Vista x64 and I love it. It's responsive as hell (seriously).

Ok.. so how do I do that? by bigattichouse · 2008-07-02 08:47 · Score: 2, Interesting

Are we just looking at crazy-ass multithreading? or do you mean we need some special API? I think its really the compiler guru's who are really going to make the difference here - 99% of the world can't figure out debugging multithread apps. I'm only moderately successful with it if I build small single process "kernels" (to steal a graphics term) that process a work item, and then a loader that keeps track of workitems .. then fire up a bunch of threads and feed the cloud a bunch of discrete workitems. Synchronizing threads is no fun.

--
meh

Re:Ok.. so how do I do that? by Phroggy · 2008-07-02 09:02 · Score: 4, Informative

A year or so ago, I saw a presentation on Thread Building Blocks, which is basically an API thingie that Intel created to help with this issue. Their big announcement last year was that they've released it open-source and have committed to making it cross-platform. (It's in Intel's best interest to get people using TBB on Athlon, PPC, and other architectures, because the more software is multi-core aware, the more demand there will be for multi-core CPUs in general, which Intel seems pretty excited about.)

--
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;

been there, done that by frovingslosh · 2008-07-02 08:49 · Score: 5, Funny

Heck, my original computer had 229376 cores. They were arranged in 28k 16 bit words.

--
I'm an American. I love this country and the freedoms that we used to have.

Downright neat by Alarindris · 2008-07-02 08:51 · Score: 2, Funny

640K cores should be enough for anybody.

Good idea by Piranhaa · 2008-07-02 08:52 · Score: 4, Insightful

It's a good idea.. Somewhat of the same idea that the Cell chip has going for it (and well, Phenom X3s). You make a product with lots of redunant objects so that when some are bound to failure, the percentage of failure is much lower..

If there are 1000 cores on a chip, and 100 go bad... You're still only losing a *maximum* of 10% of performance versus when you have 2 or 4 cores and 1 or 2 go bad, you have a performance impact of 50% essentially.. Brings costs down because yeilds go up dramatically.

Already Happening by sheepweevil · 2008-07-02 08:55 · Score: 3, Informative

Supercomputers already have many more than thousands of cores. The IBM Blue Gene/P can have up to 1,048,576 cores. What Intel is probably talking about is bringing that level of parallel computing to smaller computers.

Re:Useless by CastrTroy · 2008-07-02 08:59 · Score: 5, Insightful

Well, parallel programming is hard. It's not so hard that it can't be done, but it's harder than sequential programming. Unless your app will have a specific advantage because of this parallel programming, then it isn't worth the effort to do it in the first place. The nice thing however, would be that you could run each process on a separate core, and there wouldn't be any task switching needed. This would speed things up quite a bit. Also, if you locked a process or thread to each core, then one slow down wouldn't take out the entire system.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.

Declarative languages is the answer by olvemaudal · 2008-07-02 08:59 · Score: 3, Interesting

In order to utilize mega-core processors, I believe that we need to rethink the way we program computers. Instead of using imperative programming languages (eg, C, C++, Java) we might need to look at declarative languages like Erlang, Haskell, F# and so on. Read more about this at http://olvemaudal.wordpress.com/2008/01/04/erlang-gives-me-positive-vibes/

Re:Declarative languages is the answer by tanadeau · 2008-07-02 15:10 · Score: 2, Informative

Declarative languages are ones like Prolog. You're talking about functional programming (Lisp, Haskell, Erlang, OCaml, etc.) which is a wholly different (and easier to understand) beast.

Lookahead/predictive branching is one option... by Cordath · 2008-07-02 09:00 · Score: 4, Interesting

Say you have a slow, plodding sequential process. If you reach a point where there are several possibilities and you have an abundance of cores, you can start work on each of the possibilities while you're still deciding which possibility is actually the right one. Many CPU's already incorporate this sort of logic. It is, however, rather wasteful of resources and provides a relatively modest speedup. Applying it at a higher level should work, in principle, although it obviously isn't going to be practical for many problems.

I do see this move by Intel as a direct follow up to their plans to negate the processing advantages of today's video cards. Intel wants people running general purpose code to run it on their general purpose CPU's, not on their video cards using CUDA or the like. If the future of video game rendering is indeed ray-tracing (an embarrassingly parallel algorithm if ever there was one) then this move will also position Intel to compete directly with Nvidia for the raw processing power market.

One thing is for sure, there's a lot of coding to do. Very few programs currently make effective use of even 2 cores. Parallelization of code can be quite tricky, so hopefully tools will evolve that will make it easier for the typical code-monkey who's never written a parallel algorithm in his life.

Is that really a good idea? by neokushan · 2008-07-02 09:03 · Score: 3, Interesting

I'm all for newer, faster processors. Hell, I'm all for processors with lots of cores that can be used, but wouldn't completely redoing all of the software libraries and such that we've got used to cause a hell of a divide in developers?
Sure, if you only develop on an x86 platform, you're fine, but what if you want to write software for ARM or PPC? Processors that might not adopt the "thousands of cores" model?
Would it not be better to design a processor that can intelligently utilise single threads across multiple cores? (I know this isn't an easy task, but I don't see it being much harder than what Intel is proposing here).
Or is this some long-time plan by intel to try to lock people into their platforms even more?

--
+1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill

Re:Generic jokes by TaoPhoenix · 2008-07-02 09:03 · Score: 5, Funny

In the Soviet Union ...

Oh wait... the Soviet Union already broke into smaller cores.

--
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine

Desperation? by HunterZ · 2008-07-02 09:05 · Score: 3, Interesting

Honestly I wonder if Intel isn't looking at the expense of pushing per-core speed further and comparing it against the cost of just adding more cores. The unfortunately reality is that the many-core approach really doesn't fit the desktop use case very well. Sure, you could devote an entire core to each process, but the typical desktop user is only interested in the performance of the one progress in the foreground that's being interacted with.

It's also worth mentioning that some individual applications just aren't parallelizable to the extent that more than a couple of cores could be exercised for any significant portion of the application's run time.

--
Arguing about vi versus Emacs is like arguing whether it's better to make fire by rubbing sticks or banging rocks.

Even 64 sounds optimistic by Joce640k · 2008-07-02 09:05 · Score: 2, Interesting

I'd be surprised if a desktop PC ever really uses more than eight. Desktop software is sequential, as you said. It doesn't parallelize.

Games will be doing their physics, etc., on the graphics card by then. I don't know if the current fad for doing it on the GPU will go anywhere much but I can see graphics cards starting out this way then going to a separate on-board PPU once the APIs stabilize.

We might *have* 64 cores simply because the price difference between 8 and 64 is a couple of bucks, but they won't be used for much.

--
No sig today...

Intel is building an FPGA by obender · 2008-07-02 09:06 · Score: 2, Interesting

From TFA:

Dozens, hundreds, and even thousands of cores are not unusual design points

I don't think they mean cores like the regular x86 cores, I think they will put an FPGA on the same die together with the regular four/six cores.

Start! What do they mean, start? by 4pins · 2008-07-02 09:10 · Score: 3, Interesting

It has been long taught in theory classes that certain things can be solved in fewer steps using nondeterministic programming. The problem is that you have to follow multiple paths until you hit the right one. With sufficiently many cores the computer can follow all the possible paths at the same time, resulting in a quicker answer. http://en.wikipedia.org/wiki/Non-deterministic_algorithm http://en.wikipedia.org/wiki/Nondeterministic_Programming

--
I will not mourn that which I never had to lose. - Unknown

Re:We all saw it coming anyway by ClosedSource · 2008-07-02 09:10 · Score: 5, Insightful

"So whether programmers find this move acceptable or not is irrelevant because this path is probably the only way to design faster CPU:s once we've hit the nanometer wall."

I guess you should put "faster" in quotes.

In any case, it is absolutely relevant what programmers think since any performance improvements that customers actually experience is dependent on our code.

Historically a primary reason to buy a new computer is because a faster system makes legacy applications run faster. To a large extent this won't be true with a new multicore PC. So why would people buy them?

That's why Intel wants us to redesign our software - so that in the future their customers will still have a reason to buy a new PC with Intel Inside.

Imagine a Beowulf cluster.... by davidwr · 2008-07-02 09:11 · Score: 5, Funny

oh nevermind, what's the point?

--
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.

Re:Useless by everphilski · 2008-07-02 09:14 · Score: 3, Interesting

Parallel programming doesn't have to be hard, in fact, it comes very naturally in a number of domains. For example, in finite element analysis (used in a number of math disciplines, including CFD and various stress type calculations) the problem domain is broken down into elements which can naturally be distributed. Calculations within an element are completely independent of the domain until the system of equations are to be solved, and efficient parallelized matrix solvers is old hat.

We got to keep reminding ourselves, the world we live in runs in parallel, why shouldn't our computers?

Heat issues by the_olo · 2008-07-02 09:18 · Score: 3, Interesting

How are they going to cope with excessive heat and power consumption? How are they going to dissipate heat from a thousand cores?

When the processing power growth was fed by shrinking transistors, the heat stayed at manageable level (well, it gradually increased with packing more and more elements on die, but the function wasn't linear). Smaller circuits yielded less heat, despite being much more of them.

Now we're packing more and more chips into one package instead and shrinkage of transistors has significantly slowed down. So how are they going to pack those thousand cores into a small number of CPUs and manage power and heat output?

All we need to do now... by DerPflanz · 2008-07-02 09:20 · Score: 3, Interesting

is find out how to program that. I'm a programmer and I know the problems that are involved in (massive) parallel programming. For a lot of problems, it is either impossible or very hard. See also my essay 'Why does software suck' (dutch) (babelfish translation).

--
-- The Internet is a too slow way of doing things, you'd never do without it.

It's all changing too fast by blowhole · 2008-07-02 09:26 · Score: 2, Insightful

I've only been programming professionally for 3 years now, but already I'm shaking in my boots over having to rethink and relearn the way I've done things to accomodate these massively parallel architectures. I can't imagine how scared must be the old timers of 20, 30, or more years. Or maybe the good ones who are still hacking decades later have already had to deal with paradigm shifts and aren't scared at all?

--
"Ask me about Loom"

Re:It's all changing too fast by GatesDA · 2008-07-02 10:13 · Score: 5, Insightful

My dad's been programming for decades, and he's much more used to paradigm shifts than I am. His first programming job was translating assembly from one architechture to another, and now he's a proficient web developer. He understands concurrency and keeps up to date on new developments.
I'm reminded of an anecdote told to me during a presentation. The presenter had been introducing a new technology, and one man had a concern: "I've just worked hard to learn the previous technology. Can you promise me that, if I learn this one, it will be the last one I ever have to learn?" The presenter replied, "I can't promise you that, but I can promise you that you're in the wrong profession."
Re:It's all changing too fast by uncqual · 2008-07-02 10:35 · Score: 4, Interesting

If a programmer has prospered for 20 or 30 years in this business, they probably have adapted to multiple paradigm shifts.

For example, "CPU expensive, memory expensive, programmer cheap" is now "CPU cheap, memory cheap, programmer expensive" -- hence Java et al. (I am sometimes amazed when I casually allocate/free chunks of memory larger than all the combined memory of all the computers at my university - both in the labs and the administration/operational side - but what amazes me is that it doesn't amaze me!)

Actually some of the "old timers" may be a more comfortable with some issues of highly parallel programming than some of the "kids" (term used with respect, we were all kids once!) who have mostly had them masked from them by high level languages. Comparing "old timers" to "kids" doing enterprise server software, the kids seem much less likely to understand issues like memory coherence models of specific architectures, cache contention issues of specific implementations, etc.

Also, too often, the kids make assumptions about the source of performance/timing problems rather than gathering empirical evidence and acting on that evidence. This trait is particularly problematic because when dealing with concurrency and varying load conditions, intuition can be quite unreliable.

Really, it's not all that scary - the first paradigm shift is the hardest!

--
Why is there an "insightful" mod and why isn't it "-1"? If I wanted insight, I wouldn't be reading /.
Re:It's all changing too fast by geekoid · 2008-07-02 12:46 · Score: 2, Insightful

We're not scared. All the good ones spit in to their hands, brace themselves and say "Bring it on."
Any old timers actually scared needs to leave, and don't let your beard get caught in the door on the way out, wuss.
Don't worry about relearning, by the time this hits the market, tools will ahve been written, and there will ahve been a lot of documentation.
It's going to be a great step in computing... Or it will get killed becasue the tools weren't developed fast enough.

--
The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect

Re:That's all well and good..... by pimpimpim · 2008-07-02 09:26 · Score: 3, Informative

bingo. The problem is there. I've followed an introductory course on parallel programming (not saying I'm an expert, though), and while the idea of multiple processor programming is fairly simple, the implementation is amazingly difficult and painful.

Example: "race condition" Say processor one is trying to find the optimal value of variable A, and processor two is doing something different, but calling some subfunction which changes variable A, then processor one might keep on running forever.

The other main problem is the deadlock: Processor one needs the final result of variable B to calculate variable A, but processor two needs the final result of variable A to calculate B. Both processors will come to a standstill, and the program is halting forever.

For simple programs, these things are relatively easy to troubleshoot. But for your huge program package with hundreds of modules, it is almost impossible to know what is happening.

Actually, it is the duty of intel and co. to find a way to prevent these situations, but also there, what kind of genius is able to program an automated debugger that can find deadlocks and race conditions.

--
molmod.com - computing tips from a molecular modeling

Re:Imagine the new math! by doti · 2008-07-02 09:34 · Score: 2, Funny

A lot.

--
factor 966971: 966971

look what happened to ps3 by edxwelch · 2008-07-02 09:37 · Score: 4, Interesting

So now we have a shit load of cores all we have to do is wait for the developers to put some multi-threading goodness in their apps.... or maybe not.
The PS3 was ment to be faster than any other system because of it's multi-cores cell architecture, but in a interview John Carmack said, "Although it's interesting that almost all of the PS3 launch titles hardly used any Cells at all."

http://www.gameinformer.com/News/Story/200708/N07.0803.1731.12214.htm

Interesting challenges by Eravnrekaree · 2008-07-02 09:41 · Score: 2, Interesting

If people are writing their applications using threads, I dont see there should be a big problem with more cores. Basically, threads should be used where it is practical and makes sense and does not make programming that much more difficult, in fact it can make things eisier. Rather than some overly complicated reengineering, threads when properly used can lead to programs that are just as easy to understand. They can be used for a program that does many tasks, processing can usually be parallelised when you have different operations which do not depend on the output of each other. A list of instructions which depends on output of a previous instructions, which must run sequentially, of course cannot be threaded or paralellised. Obvious example of applications that can be threaded is a server, where you have a thread to process data from each socket, a program which scans multiple files, can have a thread for processing each file, etc.

it's.... by thermian · 2008-07-02 09:42 · Score: 4, Funny

OVER 9000!!!!!!11111one

--
A learning experience is one of those things that say, 'You know that thing you just did? Don't do that.' - D. Adams

it's not about cores by speedtux · 2008-07-02 09:50 · Score: 2, Interesting

If you put 1000 cores on a chip and plug it into a PC... very little would happen in terms of speedup.

What we need to know is the memory architecture. How is memory allocated to cores? How is data transferred? What are the relative costs of accesses? How are the caches handled?

Without that information, it's pointless to think about hundreds or thousands of cores. And I suspect even Intel doesn't know the answers yet. And there's a good chance that a company other than Intel will actually deliver the solution.

Re:It's already here. by drinkypoo · 2008-07-02 09:52 · Score: 2, Interesting

Last time I checked my computer had only one GPU core, which had a multitude of functional units. So does my CPU, in fact, but the GPU has more. Each CPU has its own "context" (the state of certain registers which store pointers, and the flags register.) More CPU cores means more contexts means less context switches means cheaper threads. Pretty simple!

CUDA &c are cool in that they offer you a way to use your video card for non-video applications when it is idle. However, their use is likely to be cyclical. It seems that we go through phases of having lots of custom hardware, and then getting cheap horsepower to throw at problems and thus having less custom hardware and doing more things in software, then having things flop back the other way. The PC was originally an expression of software-heavy use, but these days we have standard graphics processors and physics processors are even gaining some ground. Eventually the processors will take another big jump (having a thousand cores would qualify) and then everyone will want to do all this stuff on the CPU again, because a) it will be able to do it and b) you won't have to mess with two processors to get one job done.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"

Intel is dead... by fluffykitty1234 · 2008-07-02 09:58 · Score: 2, Interesting

I have two comments:

1) Intel is doing this because they've run out of optimizations on single core systems, basically this is the only thing they have left to preserve their market. I expect this time next year you'll see ARM SoC's with 1Ghz+ processors that draw under 1W of power and sell for under $10. These cores will be changing the low end of the market. Intel won't be able to continue to charge $50 for a processor when you get the same or better perf for 1/5 the cost. The only real advantage Intel has is that Windows XP/Vista doesn't run on ARM.

2) The Processor Company Graveyard is filled with companies that have touted parallel processing solutions that were going to revolutionize the world of computing. Parallel processing is extremely difficult, and only fits a subset of computing needs, we will need fast single processor systems for a long time to come. I wish Intel luck on this endeavor, everyone else has failed miserably.

Profit!!! by DeVilla · 2008-07-02 10:03 · Score: 5, Funny

Hi. I make processors. I know a lot about processors. I think a big change is coming to processors. I think you should learn to use a lot of processors. A whole lot of processors. You need more processors. Oh, and did I tell you I make processors?

Re:I'm not bitter. by GatesDA · 2008-07-02 10:07 · Score: 2, Informative

They'll have an excuse if we have 3D monitors at that point

3D monitors already exist and are available for purchase; there are even some that don't need glasses. To go with those, nVidia has stereo drivers up on their website that will work on all their cards and with most games. (Last I checked, ATI's stereo drivers only work on their workstation cards).

To make a game work in 3D, the graphics card just renders two images -- one for each eye; that's not enough work to be used as an excuse for poor performance. Of course, you can always increase the size of armies and such if you WANT to lower performance. They'll find a way.

http://en.wikipedia.org/wiki/Autostereoscopy

you mean SGI by ArchieBunker · 2008-07-02 10:08 · Score: 4, Insightful

SGI and or Cray were using NUMA a decade ago.

--
Only the State obtains its revenue by coercion. - Murray Rothbard

Cores? by mugnyte · 2008-07-02 10:20 · Score: 3, Interesting

Can't they just make the existing ones go faster? Seriously, if I want to start architectures around 1000's of independent threads of execution, i'd start with communication speeds, not node count.

It's already easy to spawn thread armies that peg all IO channels. Where is all this "work" you can do without any IO?

I think Intel better starting thinking of "tens, hundreds or even thousands" of bus speed multipliers on their napkin drawings.

Aside from some heavy processing-dependent concepts (graphics, complex mathematical models, etc) the world need petabyte/sec connectivity, not instruction set munching.

Databases and implimentation-neutrality by Tablizer · 2008-07-02 10:23 · Score: 4, Interesting

Databases provide a wonderful opportunity to apply multi-core processing. The nice thing about a (good) database is that queries describe what you want, not how to go about getting it. Thus, the database can potentially split the load up to many processes and the query writer (app) does not have to change a thing in his/her code. Whether a serial or parallel process carries it out is in theory out of the app developer's hair (although dealing with transaction management may sometimes come into play for certain uses.)

However, query languages may need to become more general-purpose in order to have our apps depend on them more, not just business data. For example, built-in graph (network) and tree traversal may need to be added and/or standardized in query languages. And, we made need to clean up the weak-points of SQL and create more dynamic DB's to better match dynamic languages and scripting.

Being a DB-head, I've discovered that a lot of processing can potentially be converted into DB queries. That way one is not writing explicit pointer-based linked lists etc., locking one into a difficult-to-parallel-ize implementation.

Relational engines used to be considered too bulky for many desktop applications. This is partly because they make processing go through a DB abstraction layer and thus are not using direct RAM pointers. However, the flip-side of this extra layer is that they are well-suited to parallelization.

--
Table-ized A.I.

Re:Databases and implimentation-neutrality by Shados · 2008-07-02 10:54 · Score: 4, Informative

By "a lot of processing can potentially be converted into DB queries", what you discovered is functional programming :) LINQ in .NET 3.5/C# 3.0 is an example of functional programming that is made to look like DB queries, but it isn't the only way. It is a LOT easier to convert that stuff and optimize it to the environment (like how SQL is processed), since it describes the "what" more than the "how". It is already done, and one (out of many examples) is Parallel LINQ, which smartly execute LINQ queries in parallel, optimized for the amount of cores, etc. (And I'm talking about LINQ in the context of in memory process, not LINQ to SQL, which simply convert LINQ queries into SQL ones).
Functional programming, tied with the concept of transactional memory to handle concurency, is a nice medium term solution to the multi-core problem.

Funny... by socialhack · 2008-07-02 10:26 · Score: 2, Funny

Back in 2002 when I was working for a software company that was using OCR on hundreds of thousands of images, I was pushing clustered computing. I had an engineer (not one of ours) tell me that it would probably never be practical to develop software to take advantage of multiple processors. I wonder what he would say today.

--
Never leave a dead horse unbeaten!

I can see this being helpful by subspacemsg · 2008-07-02 10:26 · Score: 2, Interesting

Multi-core can be useful with existing programmin models. Imagine getting rid of the context switcher forever and executing threads/processes on a new core every time an application is launched or a thread is spawned. The OS can incorporate a Core manager similar to a memory manager.

This is an effective method as long as the processor is able to manage its load properly internally.

i.e if a processor has say 100 cores..with a combined processing capacity per unit time of Z and there are X threads and the processing capacity of 1 core per unit time is Y XY must always equal Z. The challenge is how do u manage Core loads within the CPU, if Intel can solve that uber multi can really take off.

Re:Microsoft's reply by David+Greene · 2008-07-02 10:29 · Score: 2, Informative

That's no joke. It's not at all unusual to have to wait hours for tens of thousands of core files to be produced on large HPC machines. Debugging at scale is a really, really hard problem.

--

My first thought... by Druppy · 2008-07-02 10:51 · Score: 4, Funny

Is it bad that my first thought when I saw this was: "But, my code already generates thousands of cores..."

Difference by XanC · 2008-07-02 11:09 · Score: 2, Insightful

What's different this time may be that nobody else has anything better. Last time, AMD64 was the easier solution, and it clobbered Itanium. Can AMD (or anybody) simply choose to keep making single cores faster, or is multi-core the way CPUs really must go from here?

so, Intel made risc passé... by DragonTHC · 2008-07-02 12:02 · Score: 2, Insightful

and now they're bringing it back?

we all learned how 1000 cores doesn't matter if each core can only process a simplified instruction set compared to 2 cores that can handle more data per thread.

this is basic computer design here people.

--
They're using their grammar skills there.

Can't you only have 1 core? by kahanamoku · 2008-07-02 12:03 · Score: 2, Informative

By definition, isn't a core just the middle/root of something? if you have more than 1 core, shouldn't the term really be changed to reflect something closer to which it represents?

--
----- Concentrate on promoting more than demoting.

Re:kernel tasker responsibility by Skapare · 2008-07-02 12:05 · Score: 2, Interesting

I think that gcc should insert code to control memory leaks and process safety and the kernel should be in charge of tasking between cores.

Please limit this desire to languages like Java, Python, and Ruby. We don't need this in C. If you can't program without it, you shouldn't be programming in C.

--
now we need to go OSS in diesel cars

Re:Hey remember the 1980's and the Amiga? by ergo98 · 2008-07-02 12:40 · Score: 2, Insightful

We have come full circle now with dual core and up chips and the GPU being built into the CPU now, back to the Amiga, which was a superior system design.

How is that back to the Amiga?

The PC platform hit Amiga levels well over a decade and a half ago, with dedicated graphics hardware, dedicated audio hardware, dedicated network hardware, a numerical coprocessor, and so on. People need to stop claiming every new change finally brings things back to the Amiga. That argument is terribly old.

And yeah I was into the Amiga and Atari ST and Mac Classic back in those days, but then I moved on.

Bill gates was just mis-quoted by Growlor · 2008-07-02 13:38 · Score: 5, Funny

He meant 640k CORES should be enough for anybody.

Re:That's all well and good..... by mrchaotica · 2008-07-02 14:14 · Score: 2, Informative

but can we PLEASE work on getting apps to run on more than just ONE core/processor for now?

Why?

The kind of parallelism needed for a few cores (coarse-grained task parallelism) is entirely different than the kind of parallelism needed for hundreds or thousands of cores (fine-grained data parallelism). Designing for a few cores won't do us a damn bit of good when we have hundreds or thousands.

--

"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz

Re:Yeah, right. by Doctor+Faustus · 2008-07-02 14:41 · Score: 2, Insightful

The notion that some revolutionary compiler or IDE is going to solve this problem is just wrong. Tell it to Itanic, that was based on exactly these assumptions and failed miserably because of them.
With Itanium, they were trying to say compiler improvements could handle it invisibly, with no work from the application programmers. Taking advantage of more than two cores (since one can take care of other programs that would have slowed down your app) is going to take conscious thought about what can and can't be parallel. Taking advantage of more than a handful is going to take more fundamental shifts in how we program. They're asking a lot more this time.

On the other hand, you could easily opt out of Itanium. Now, this is the only way your programs are going to get much future processing improvement. Ever. No matter who you're buying CPUs from.

Gaming? by phorm · 2008-07-02 17:31 · Score: 2, Informative

I'd say that it could have a rather hefty impact on the graphics industry (though to be fair, both tend to share tech fairly regularly as it is) as well as many others.

How about servers? If you have 1000 cores, and 1000 clients connecting through the network, then each core could service a client (though depending on what they're doing, IO and other issues also rear their heads). Another nice aspect would be that if you could fix a process to a certain # of cores, you could always be sure that it wouldn't max out your entire CPU capacity.

Re:Gaming? by walshy007 · 2008-07-02 20:08 · Score: 2, Informative

"Another nice aspect would be that if you could fix a process to a certain # of cores" already can in linux, schedtool lets you set hard cpu affinities per process, you can let it only go on certain cores if you like

Missing the point by Orgasmatron · 2008-07-03 01:58 · Score: 2, Insightful

The point is that this is going to happen, whether anyone likes it or not.

CPU clock speeds ran into the brick wall a few years ago. Here is a chart showing CPU clocks from 1993 to 2005.

There have been no major performance improvements from that direction for the last few years, and probably won't be any more without a major breakthrough in semiconductors.

Moore's law is about transistor counts, and shows no real signs of stopping. Every 18 to 24 months, we double the number of transistors on a given wafer/die. The transistion to 64 bit CPUs used a generation or two of those extra transistors, but we aren't likely to move to 128 bits soon. We are already pretty deep into the diminishing-returns curve for on-die cache.

What is left to consume those transistors?

More cores. Lots more cores. If you replace your CPU every 2 years, you can pretty much bet that each one you buy for the next decade or so will have twice as many cores as the one it is replacing.

And if developers and compilers get good at managing parallel code (and they have no choice in this), you can expect core counts to go up even faster than doubling ever couple of years.

--
See that "Preview" button?

Slashdot Mirror

Intel Says to Prepare For "Thousands of Cores"

132 of 638 comments (clear)