Intel Says to Prepare For "Thousands of Cores"
Impy the Impiuos Imp writes to tell us that in a recent statement Intel has revealed their plans for the future and it goes well beyond the traditional processor model. Suggesting developers start thinking about tens, hundreds, or even thousand or cores, it seems Intel is pushing for a massive evolution in the way processing is handled. "Now, however, Intel is increasingly 'discussing how to scale performance to core counts that we aren't yet shipping...Dozens, hundreds, and even thousands of cores are not unusual design points around which the conversations meander,' [Anwar Ghuloum, a principal engineer with Intel's Microprocessor Technology Lab] said. He says that the more radical programming path to tap into many processing cores 'presents the "opportunity" for a major refactoring of their code base, including changes in languages, libraries, and engineering methodologies and conventions they've adhered to for (often) most of the their software's existence.'"
- and - oh my God - it's full of cores!
It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
I'm no software engineer, but it seems like a lot of the issue in designing for multiple cores is being able to turn large tasks into many independent discrete operations that can be processed in tandem. But it seems that some tasks lend themselves to that compartmentalization and some don't. If you have 1,000 half-gigahertz cores running a 3D simulation, you may be able to get 875 FPS out of Doom X at 1920x1440, but what about the processes that are slow and plodding and sequential? How do those get sped up if you're opting for more cores instead of more cycles?
Start a happiness pandemic
As if Oracle licensing wasn't complicated enough already...
-- "Other than that, how was the play Mrs. Lincoln?"
If you can get a thousand cores on a chip, and you still only have enough pins for a handful (at best) of memory interfaces, doesn't memory become a HUGE bottleneck? How do these cores not get starved for data?
I'm perfect in every way, except for my humility.
At Supercomputing 2006, they had a wonderful panel where they discussed the future of computing in general, and tried to predict what computers (especially Supercomputers) would look like in 2020. Tom Sterling made what I thought was one of the most insightful observations of the panel -- most of the code out there is sequential (or nearly so) and I/O bound. So your home user checking his email, running a web browser, etc is not going to benefit much from having all that compute power. (Gamers are obviously not included in this) Thus, he predicted, processors would max out at a "relatively" low number of cores - 64 was his prediction.
To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
Are we just looking at crazy-ass multithreading? or do you mean we need some special API? I think its really the compiler guru's who are really going to make the difference here - 99% of the world can't figure out debugging multithread apps. I'm only moderately successful with it if I build small single process "kernels" (to steal a graphics term) that process a work item, and then a loader that keeps track of workitems .. then fire up a bunch of threads and feed the cloud a bunch of discrete workitems. Synchronizing threads is no fun.
meh
Heck, my original computer had 229376 cores. They were arranged in 28k 16 bit words.
I'm an American. I love this country and the freedoms that we used to have.
640K cores should be enough for anybody.
It's a good idea.. Somewhat of the same idea that the Cell chip has going for it (and well, Phenom X3s). You make a product with lots of redunant objects so that when some are bound to failure, the percentage of failure is much lower..
If there are 1000 cores on a chip, and 100 go bad... You're still only losing a *maximum* of 10% of performance versus when you have 2 or 4 cores and 1 or 2 go bad, you have a performance impact of 50% essentially.. Brings costs down because yeilds go up dramatically.
Supercomputers already have many more than thousands of cores. The IBM Blue Gene/P can have up to 1,048,576 cores. What Intel is probably talking about is bringing that level of parallel computing to smaller computers.
Well, parallel programming is hard. It's not so hard that it can't be done, but it's harder than sequential programming. Unless your app will have a specific advantage because of this parallel programming, then it isn't worth the effort to do it in the first place. The nice thing however, would be that you could run each process on a separate core, and there wouldn't be any task switching needed. This would speed things up quite a bit. Also, if you locked a process or thread to each core, then one slow down wouldn't take out the entire system.
Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
In order to utilize mega-core processors, I believe that we need to rethink the way we program computers. Instead of using imperative programming languages (eg, C, C++, Java) we might need to look at declarative languages like Erlang, Haskell, F# and so on. Read more about this at http://olvemaudal.wordpress.com/2008/01/04/erlang-gives-me-positive-vibes/
Say you have a slow, plodding sequential process. If you reach a point where there are several possibilities and you have an abundance of cores, you can start work on each of the possibilities while you're still deciding which possibility is actually the right one. Many CPU's already incorporate this sort of logic. It is, however, rather wasteful of resources and provides a relatively modest speedup. Applying it at a higher level should work, in principle, although it obviously isn't going to be practical for many problems.
I do see this move by Intel as a direct follow up to their plans to negate the processing advantages of today's video cards. Intel wants people running general purpose code to run it on their general purpose CPU's, not on their video cards using CUDA or the like. If the future of video game rendering is indeed ray-tracing (an embarrassingly parallel algorithm if ever there was one) then this move will also position Intel to compete directly with Nvidia for the raw processing power market.
One thing is for sure, there's a lot of coding to do. Very few programs currently make effective use of even 2 cores. Parallelization of code can be quite tricky, so hopefully tools will evolve that will make it easier for the typical code-monkey who's never written a parallel algorithm in his life.
I'm all for newer, faster processors. Hell, I'm all for processors with lots of cores that can be used, but wouldn't completely redoing all of the software libraries and such that we've got used to cause a hell of a divide in developers?
Sure, if you only develop on an x86 platform, you're fine, but what if you want to write software for ARM or PPC? Processors that might not adopt the "thousands of cores" model?
Would it not be better to design a processor that can intelligently utilise single threads across multiple cores? (I know this isn't an easy task, but I don't see it being much harder than what Intel is proposing here).
Or is this some long-time plan by intel to try to lock people into their platforms even more?
+1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill
In the Soviet Union
Oh wait... the Soviet Union already broke into smaller cores.
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
Honestly I wonder if Intel isn't looking at the expense of pushing per-core speed further and comparing it against the cost of just adding more cores. The unfortunately reality is that the many-core approach really doesn't fit the desktop use case very well. Sure, you could devote an entire core to each process, but the typical desktop user is only interested in the performance of the one progress in the foreground that's being interacted with.
It's also worth mentioning that some individual applications just aren't parallelizable to the extent that more than a couple of cores could be exercised for any significant portion of the application's run time.
Arguing about vi versus Emacs is like arguing whether it's better to make fire by rubbing sticks or banging rocks.
I'd be surprised if a desktop PC ever really uses more than eight. Desktop software is sequential, as you said. It doesn't parallelize.
Games will be doing their physics, etc., on the graphics card by then. I don't know if the current fad for doing it on the GPU will go anywhere much but I can see graphics cards starting out this way then going to a separate on-board PPU once the APIs stabilize.
We might *have* 64 cores simply because the price difference between 8 and 64 is a couple of bucks, but they won't be used for much.
No sig today...
Dozens, hundreds, and even thousands of cores are not unusual design points
I don't think they mean cores like the regular x86 cores, I think they will put an FPGA on the same die together with the regular four/six cores.
It has been long taught in theory classes that certain things can be solved in fewer steps using nondeterministic programming. The problem is that you have to follow multiple paths until you hit the right one. With sufficiently many cores the computer can follow all the possible paths at the same time, resulting in a quicker answer. http://en.wikipedia.org/wiki/Non-deterministic_algorithm http://en.wikipedia.org/wiki/Nondeterministic_Programming
I will not mourn that which I never had to lose. - Unknown
"So whether programmers find this move acceptable or not is irrelevant because this path is probably the only way to design faster CPU:s once we've hit the nanometer wall."
I guess you should put "faster" in quotes.
In any case, it is absolutely relevant what programmers think since any performance improvements that customers actually experience is dependent on our code.
Historically a primary reason to buy a new computer is because a faster system makes legacy applications run faster. To a large extent this won't be true with a new multicore PC. So why would people buy them?
That's why Intel wants us to redesign our software - so that in the future their customers will still have a reason to buy a new PC with Intel Inside.
oh nevermind, what's the point?
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Parallel programming doesn't have to be hard, in fact, it comes very naturally in a number of domains. For example, in finite element analysis (used in a number of math disciplines, including CFD and various stress type calculations) the problem domain is broken down into elements which can naturally be distributed. Calculations within an element are completely independent of the domain until the system of equations are to be solved, and efficient parallelized matrix solvers is old hat.
We got to keep reminding ourselves, the world we live in runs in parallel, why shouldn't our computers?
How are they going to cope with excessive heat and power consumption? How are they going to dissipate heat from a thousand cores?
When the processing power growth was fed by shrinking transistors, the heat stayed at manageable level (well, it gradually increased with packing more and more elements on die, but the function wasn't linear). Smaller circuits yielded less heat, despite being much more of them.
Now we're packing more and more chips into one package instead and shrinkage of transistors has significantly slowed down. So how are they going to pack those thousand cores into a small number of CPUs and manage power and heat output?
is find out how to program that. I'm a programmer and I know the problems that are involved in (massive) parallel programming. For a lot of problems, it is either impossible or very hard. See also my essay 'Why does software suck' (dutch) (babelfish translation).
-- The Internet is a too slow way of doing things, you'd never do without it.
I've only been programming professionally for 3 years now, but already I'm shaking in my boots over having to rethink and relearn the way I've done things to accomodate these massively parallel architectures. I can't imagine how scared must be the old timers of 20, 30, or more years. Or maybe the good ones who are still hacking decades later have already had to deal with paradigm shifts and aren't scared at all?
"Ask me about Loom"
Example: "race condition" Say processor one is trying to find the optimal value of variable A, and processor two is doing something different, but calling some subfunction which changes variable A, then processor one might keep on running forever.
The other main problem is the deadlock: Processor one needs the final result of variable B to calculate variable A, but processor two needs the final result of variable A to calculate B. Both processors will come to a standstill, and the program is halting forever.
For simple programs, these things are relatively easy to troubleshoot. But for your huge program package with hundreds of modules, it is almost impossible to know what is happening.
Actually, it is the duty of intel and co. to find a way to prevent these situations, but also there, what kind of genius is able to program an automated debugger that can find deadlocks and race conditions.
molmod.com - computing tips from a molecular modeling
A lot.
factor 966971: 966971
So now we have a shit load of cores all we have to do is wait for the developers to put some multi-threading goodness in their apps.... or maybe not.
The PS3 was ment to be faster than any other system because of it's multi-cores cell architecture, but in a interview John Carmack said, "Although it's interesting that almost all of the PS3 launch titles hardly used any Cells at all."
http://www.gameinformer.com/News/Story/200708/N07.0803.1731.12214.htm
If people are writing their applications using threads, I dont see there should be a big problem with more cores. Basically, threads should be used where it is practical and makes sense and does not make programming that much more difficult, in fact it can make things eisier. Rather than some overly complicated reengineering, threads when properly used can lead to programs that are just as easy to understand. They can be used for a program that does many tasks, processing can usually be parallelised when you have different operations which do not depend on the output of each other. A list of instructions which depends on output of a previous instructions, which must run sequentially, of course cannot be threaded or paralellised. Obvious example of applications that can be threaded is a server, where you have a thread to process data from each socket, a program which scans multiple files, can have a thread for processing each file, etc.
OVER 9000!!!!!!11111one
A learning experience is one of those things that say, 'You know that thing you just did? Don't do that.' - D. Adams
If you put 1000 cores on a chip and plug it into a PC... very little would happen in terms of speedup.
What we need to know is the memory architecture. How is memory allocated to cores? How is data transferred? What are the relative costs of accesses? How are the caches handled?
Without that information, it's pointless to think about hundreds or thousands of cores. And I suspect even Intel doesn't know the answers yet. And there's a good chance that a company other than Intel will actually deliver the solution.
Last time I checked my computer had only one GPU core, which had a multitude of functional units. So does my CPU, in fact, but the GPU has more. Each CPU has its own "context" (the state of certain registers which store pointers, and the flags register.) More CPU cores means more contexts means less context switches means cheaper threads. Pretty simple!
CUDA &c are cool in that they offer you a way to use your video card for non-video applications when it is idle. However, their use is likely to be cyclical. It seems that we go through phases of having lots of custom hardware, and then getting cheap horsepower to throw at problems and thus having less custom hardware and doing more things in software, then having things flop back the other way. The PC was originally an expression of software-heavy use, but these days we have standard graphics processors and physics processors are even gaining some ground. Eventually the processors will take another big jump (having a thousand cores would qualify) and then everyone will want to do all this stuff on the CPU again, because a) it will be able to do it and b) you won't have to mess with two processors to get one job done.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
I have two comments:
1) Intel is doing this because they've run out of optimizations on single core systems, basically this is the only thing they have left to preserve their market. I expect this time next year you'll see ARM SoC's with 1Ghz+ processors that draw under 1W of power and sell for under $10. These cores will be changing the low end of the market. Intel won't be able to continue to charge $50 for a processor when you get the same or better perf for 1/5 the cost. The only real advantage Intel has is that Windows XP/Vista doesn't run on ARM.
2) The Processor Company Graveyard is filled with companies that have touted parallel processing solutions that were going to revolutionize the world of computing. Parallel processing is extremely difficult, and only fits a subset of computing needs, we will need fast single processor systems for a long time to come. I wish Intel luck on this endeavor, everyone else has failed miserably.
Hi. I make processors. I know a lot about processors. I think a big change is coming to processors. I think you should learn to use a lot of processors. A whole lot of processors. You need more processors. Oh, and did I tell you I make processors?
They'll have an excuse if we have 3D monitors at that point
3D monitors already exist and are available for purchase; there are even some that don't need glasses. To go with those, nVidia has stereo drivers up on their website that will work on all their cards and with most games. (Last I checked, ATI's stereo drivers only work on their workstation cards).
To make a game work in 3D, the graphics card just renders two images -- one for each eye; that's not enough work to be used as an excuse for poor performance. Of course, you can always increase the size of armies and such if you WANT to lower performance. They'll find a way.
http://en.wikipedia.org/wiki/Autostereoscopy
SGI and or Cray were using NUMA a decade ago.
Only the State obtains its revenue by coercion. - Murray Rothbard
Can't they just make the existing ones go faster? Seriously, if I want to start architectures around 1000's of independent threads of execution, i'd start with communication speeds, not node count.
It's already easy to spawn thread armies that peg all IO channels. Where is all this "work" you can do without any IO?
I think Intel better starting thinking of "tens, hundreds or even thousands" of bus speed multipliers on their napkin drawings.
Aside from some heavy processing-dependent concepts (graphics, complex mathematical models, etc) the world need petabyte/sec connectivity, not instruction set munching.
Databases provide a wonderful opportunity to apply multi-core processing. The nice thing about a (good) database is that queries describe what you want, not how to go about getting it. Thus, the database can potentially split the load up to many processes and the query writer (app) does not have to change a thing in his/her code. Whether a serial or parallel process carries it out is in theory out of the app developer's hair (although dealing with transaction management may sometimes come into play for certain uses.)
However, query languages may need to become more general-purpose in order to have our apps depend on them more, not just business data. For example, built-in graph (network) and tree traversal may need to be added and/or standardized in query languages. And, we made need to clean up the weak-points of SQL and create more dynamic DB's to better match dynamic languages and scripting.
Being a DB-head, I've discovered that a lot of processing can potentially be converted into DB queries. That way one is not writing explicit pointer-based linked lists etc., locking one into a difficult-to-parallel-ize implementation.
Relational engines used to be considered too bulky for many desktop applications. This is partly because they make processing go through a DB abstraction layer and thus are not using direct RAM pointers. However, the flip-side of this extra layer is that they are well-suited to parallelization.
Table-ized A.I.
Back in 2002 when I was working for a software company that was using OCR on hundreds of thousands of images, I was pushing clustered computing. I had an engineer (not one of ours) tell me that it would probably never be practical to develop software to take advantage of multiple processors. I wonder what he would say today.
Never leave a dead horse unbeaten!
This is an effective method as long as the processor is able to manage its load properly internally.
i.e if a processor has say 100 cores..with a combined processing capacity per unit time of Z and there are X threads and the processing capacity of 1 core per unit time is Y XY must always equal Z. The challenge is how do u manage Core loads within the CPU, if Intel can solve that uber multi can really take off.
That's no joke. It's not at all unusual to have to wait hours for tens of thousands of core files to be produced on large HPC machines. Debugging at scale is a really, really hard problem.
Is it bad that my first thought when I saw this was: "But, my code already generates thousands of cores..."
What's different this time may be that nobody else has anything better. Last time, AMD64 was the easier solution, and it clobbered Itanium. Can AMD (or anybody) simply choose to keep making single cores faster, or is multi-core the way CPUs really must go from here?
and now they're bringing it back?
we all learned how 1000 cores doesn't matter if each core can only process a simplified instruction set compared to 2 cores that can handle more data per thread.
this is basic computer design here people.
They're using their grammar skills there.
By definition, isn't a core just the middle/root of something? if you have more than 1 core, shouldn't the term really be changed to reflect something closer to which it represents?
----- Concentrate on promoting more than demoting.
I think that gcc should insert code to control memory leaks and process safety and the kernel should be in charge of tasking between cores.
Please limit this desire to languages like Java, Python, and Ruby. We don't need this in C. If you can't program without it, you shouldn't be programming in C.
now we need to go OSS in diesel cars
How is that back to the Amiga?
The PC platform hit Amiga levels well over a decade and a half ago, with dedicated graphics hardware, dedicated audio hardware, dedicated network hardware, a numerical coprocessor, and so on. People need to stop claiming every new change finally brings things back to the Amiga. That argument is terribly old.
And yeah I was into the Amiga and Atari ST and Mac Classic back in those days, but then I moved on.
He meant 640k CORES should be enough for anybody.
Why?
The kind of parallelism needed for a few cores (coarse-grained task parallelism) is entirely different than the kind of parallelism needed for hundreds or thousands of cores (fine-grained data parallelism). Designing for a few cores won't do us a damn bit of good when we have hundreds or thousands.
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
The notion that some revolutionary compiler or IDE is going to solve this problem is just wrong. Tell it to Itanic, that was based on exactly these assumptions and failed miserably because of them.
With Itanium, they were trying to say compiler improvements could handle it invisibly, with no work from the application programmers. Taking advantage of more than two cores (since one can take care of other programs that would have slowed down your app) is going to take conscious thought about what can and can't be parallel. Taking advantage of more than a handful is going to take more fundamental shifts in how we program. They're asking a lot more this time.
On the other hand, you could easily opt out of Itanium. Now, this is the only way your programs are going to get much future processing improvement. Ever. No matter who you're buying CPUs from.
I'd say that it could have a rather hefty impact on the graphics industry (though to be fair, both tend to share tech fairly regularly as it is) as well as many others.
How about servers? If you have 1000 cores, and 1000 clients connecting through the network, then each core could service a client (though depending on what they're doing, IO and other issues also rear their heads). Another nice aspect would be that if you could fix a process to a certain # of cores, you could always be sure that it wouldn't max out your entire CPU capacity.
The point is that this is going to happen, whether anyone likes it or not.
CPU clock speeds ran into the brick wall a few years ago. Here is a chart showing CPU clocks from 1993 to 2005.
There have been no major performance improvements from that direction for the last few years, and probably won't be any more without a major breakthrough in semiconductors.
Moore's law is about transistor counts, and shows no real signs of stopping. Every 18 to 24 months, we double the number of transistors on a given wafer/die. The transistion to 64 bit CPUs used a generation or two of those extra transistors, but we aren't likely to move to 128 bits soon. We are already pretty deep into the diminishing-returns curve for on-die cache.
What is left to consume those transistors?
More cores. Lots more cores. If you replace your CPU every 2 years, you can pretty much bet that each one you buy for the next decade or so will have twice as many cores as the one it is replacing.
And if developers and compilers get good at managing parallel code (and they have no choice in this), you can expect core counts to go up even faster than doubling ever couple of years.
See that "Preview" button?