Intel Says to Prepare For "Thousands of Cores"
Impy the Impiuos Imp writes to tell us that in a recent statement Intel has revealed their plans for the future and it goes well beyond the traditional processor model. Suggesting developers start thinking about tens, hundreds, or even thousand or cores, it seems Intel is pushing for a massive evolution in the way processing is handled. "Now, however, Intel is increasingly 'discussing how to scale performance to core counts that we aren't yet shipping...Dozens, hundreds, and even thousands of cores are not unusual design points around which the conversations meander,' [Anwar Ghuloum, a principal engineer with Intel's Microprocessor Technology Lab] said. He says that the more radical programming path to tap into many processing cores 'presents the "opportunity" for a major refactoring of their code base, including changes in languages, libraries, and engineering methodologies and conventions they've adhered to for (often) most of the their software's existence.'"
- and - oh my God - it's full of cores!
It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
Impiuos?! Bow down thee to the Gods of Grammar!!
I'm no software engineer, but it seems like a lot of the issue in designing for multiple cores is being able to turn large tasks into many independent discrete operations that can be processed in tandem. But it seems that some tasks lend themselves to that compartmentalization and some don't. If you have 1,000 half-gigahertz cores running a 3D simulation, you may be able to get 875 FPS out of Doom X at 1920x1440, but what about the processes that are slow and plodding and sequential? How do those get sped up if you're opting for more cores instead of more cycles?
Start a happiness pandemic
now, imagine a beowulf cluster of .. oh never mind.
As if Oracle licensing wasn't complicated enough already...
-- "Other than that, how was the play Mrs. Lincoln?"
If you can get a thousand cores on a chip, and you still only have enough pins for a handful (at best) of memory interfaces, doesn't memory become a HUGE bottleneck? How do these cores not get starved for data?
I'm perfect in every way, except for my humility.
Maybe Program X will finally not be so slow.
It's a series of tubes; um cores.
Howabout a beowolf clust... I can't even do that one.
At Supercomputing 2006, they had a wonderful panel where they discussed the future of computing in general, and tried to predict what computers (especially Supercomputers) would look like in 2020. Tom Sterling made what I thought was one of the most insightful observations of the panel -- most of the code out there is sequential (or nearly so) and I/O bound. So your home user checking his email, running a web browser, etc is not going to benefit much from having all that compute power. (Gamers are obviously not included in this) Thus, he predicted, processors would max out at a "relatively" low number of cores - 64 was his prediction.
To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
Are we just looking at crazy-ass multithreading? or do you mean we need some special API? I think its really the compiler guru's who are really going to make the difference here - 99% of the world can't figure out debugging multithread apps. I'm only moderately successful with it if I build small single process "kernels" (to steal a graphics term) that process a work item, and then a loader that keeps track of workitems .. then fire up a bunch of threads and feed the cloud a bunch of discrete workitems. Synchronizing threads is no fun.
meh
and imagine all those cores in a box running a bunch of virtual machines. every dba team will need an accountant.
It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
Heck, my original computer had 229376 cores. They were arranged in 28k 16 bit words.
I'm an American. I love this country and the freedoms that we used to have.
all of those DIMMs of RAM. I'm thinking they will have to come up with something smaller. Maybe more than one DIMM on a... DIMM?
One of our competitors trademarked the term "hypothesis". From now on, we will call them "boneheaded ideas".
640K cores should be enough for anybody.
It's fairly obvious that both Intel and AMD are heading this way. The transistors are shrinking, but we will soon create a transistor that cannot be shrunk further, and once this happens, we will have to think layers and cores and possibly more GHz.
So whether programmers find this move acceptable or not is irrelevant because this path is probably the only way to design faster CPU:s once we've hit the nanometer wall.
Full Tilt
We already have systems with tens and and hundreds of cores. Those processors already go by the name of "graphics card" and those changes in languages and libraries go by the name of CUDA, C2M, brook+ and the like.
The only thing new that Intel brought to the table with this press release is the attempt to fool us into believe that there is nothing of the kind available and that Intel is somehow innovating in some aspect or another.
Face it: the age of the "CPU is the computing muscle" is long gone.
Slashdot, fix your code or at least hire someone who is competent at it to do it for you.
It's a good idea.. Somewhat of the same idea that the Cell chip has going for it (and well, Phenom X3s). You make a product with lots of redunant objects so that when some are bound to failure, the percentage of failure is much lower..
If there are 1000 cores on a chip, and 100 go bad... You're still only losing a *maximum* of 10% of performance versus when you have 2 or 4 cores and 1 or 2 go bad, you have a performance impact of 50% essentially.. Brings costs down because yeilds go up dramatically.
It's over 9000!!
Knowledge is power. Knowledge shared is power lost.
The optimal # of cores will inevitably wind up being 42, but nobody should ever need more than 640K cores.
Perhaps a machine that can configure the # and type of cores it needs on the fly will come about some day.
I'm more interested in on-die RAM for now. A combined CPU/GPU/RAM hooked to SSD storage. Yum.
Where's my Singularity? I was promised a Singularity!
What's the use until programmers start learning effective parallel programming? Right now, it's game developers who are winning that game, with graphics and movie editors right behind it.
Colin Dean Go a year without DRM
I read that as 'Thousands of Crows' and had this great image of flocks of birds flying out of my ethernet port, delivering packets to the world.
http://twitter.com/OLDTELEGRAM
Supercomputers already have many more than thousands of cores. The IBM Blue Gene/P can have up to 1,048,576 cores. What Intel is probably talking about is bringing that level of parallel computing to smaller computers.
I understand why Intel is so interested in multiple cores - they can't make the faster single-core chips that the market wants.
The question is what's our motivation? Unless software performance is approximately linearly proportional to the number of cores (e.g. a 10 core cpu can run a software application 10x faster than it ran before it was made core-aware), it probably isn't worth converting legacy apps.
In order to utilize mega-core processors, I believe that we need to rethink the way we program computers. Instead of using imperative programming languages (eg, C, C++, Java) we might need to look at declarative languages like Erlang, Haskell, F# and so on. Read more about this at http://olvemaudal.wordpress.com/2008/01/04/erlang-gives-me-positive-vibes/
"Problem #6.
If Intel makes a machine with 875 cores and there are 413 machines in your Beowulf Cluster, how many total cores are there?"
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
"Oh, the fools! If only they'd built it with 6000 and 1 cores"
Say you have a slow, plodding sequential process. If you reach a point where there are several possibilities and you have an abundance of cores, you can start work on each of the possibilities while you're still deciding which possibility is actually the right one. Many CPU's already incorporate this sort of logic. It is, however, rather wasteful of resources and provides a relatively modest speedup. Applying it at a higher level should work, in principle, although it obviously isn't going to be practical for many problems.
I do see this move by Intel as a direct follow up to their plans to negate the processing advantages of today's video cards. Intel wants people running general purpose code to run it on their general purpose CPU's, not on their video cards using CUDA or the like. If the future of video game rendering is indeed ray-tracing (an embarrassingly parallel algorithm if ever there was one) then this move will also position Intel to compete directly with Nvidia for the raw processing power market.
One thing is for sure, there's a lot of coding to do. Very few programs currently make effective use of even 2 cores. Parallelization of code can be quite tricky, so hopefully tools will evolve that will make it easier for the typical code-monkey who's never written a parallel algorithm in his life.
I'm all for newer, faster processors. Hell, I'm all for processors with lots of cores that can be used, but wouldn't completely redoing all of the software libraries and such that we've got used to cause a hell of a divide in developers?
Sure, if you only develop on an x86 platform, you're fine, but what if you want to write software for ARM or PPC? Processors that might not adopt the "thousands of cores" model?
Would it not be better to design a processor that can intelligently utilise single threads across multiple cores? (I know this isn't an easy task, but I don't see it being much harder than what Intel is proposing here).
Or is this some long-time plan by intel to try to lock people into their platforms even more?
+1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill
Honestly I wonder if Intel isn't looking at the expense of pushing per-core speed further and comparing it against the cost of just adding more cores. The unfortunately reality is that the many-core approach really doesn't fit the desktop use case very well. Sure, you could devote an entire core to each process, but the typical desktop user is only interested in the performance of the one progress in the foreground that's being interacted with.
It's also worth mentioning that some individual applications just aren't parallelizable to the extent that more than a couple of cores could be exercised for any significant portion of the application's run time.
Arguing about vi versus Emacs is like arguing whether it's better to make fire by rubbing sticks or banging rocks.
I'd be surprised if a desktop PC ever really uses more than eight. Desktop software is sequential, as you said. It doesn't parallelize.
Games will be doing their physics, etc., on the graphics card by then. I don't know if the current fad for doing it on the GPU will go anywhere much but I can see graphics cards starting out this way then going to a separate on-board PPU once the APIs stabilize.
We might *have* 64 cores simply because the price difference between 8 and 64 is a couple of bucks, but they won't be used for much.
No sig today...
Dozens, hundreds, and even thousands of cores are not unusual design points
I don't think they mean cores like the regular x86 cores, I think they will put an FPGA on the same die together with the regular four/six cores.
but can we PLEASE work on getting apps to run on more than just ONE core/processor for now? I mean it's amazing how many apps still are not SMP aware unless you can (possibly) compile them for that purpose and even then you don't always get the kind of increase you would expect. Let's start with programming on 2 cores and then maybe go to 4 or more. For now, the kind of stuff Intel is discussing is moot. If we can't get shit to run on 2 procs, why do we bother with thinking about 12?
Pax Vobiscum
Intel tried to push the complexities of increasing computing speed off into software before. When they designed the Itanium, they figured that the software compiler would magically find extra concurrency in the apps and utilize the large number of functional units in the core, and that this would make other architectures obsolete. Well, it didn't quite work out as they planned.
Hopefully they won't spend $Billions going down the "hypothetical software will enable radical hardware changes" road again just to learn the same lesson as last time.
It has been long taught in theory classes that certain things can be solved in fewer steps using nondeterministic programming. The problem is that you have to follow multiple paths until you hit the right one. With sufficiently many cores the computer can follow all the possible paths at the same time, resulting in a quicker answer. http://en.wikipedia.org/wiki/Non-deterministic_algorithm http://en.wikipedia.org/wiki/Nondeterministic_Programming
I will not mourn that which I never had to lose. - Unknown
oh nevermind, what's the point?
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Prepare for thousand core dumps!
Anwar Ghuloum, a principle engineer with Intel's Microprocessor Technology Lab
don't trust him! he just wants the ring!
I'm a rabbit startled by the headlights of life
How are they going to cope with excessive heat and power consumption? How are they going to dissipate heat from a thousand cores?
When the processing power growth was fed by shrinking transistors, the heat stayed at manageable level (well, it gradually increased with packing more and more elements on die, but the function wasn't linear). Smaller circuits yielded less heat, despite being much more of them.
Now we're packing more and more chips into one package instead and shrinkage of transistors has significantly slowed down. So how are they going to pack those thousand cores into a small number of CPUs and manage power and heat output?
That problem was solved by VPU design a long long time ago, there is an array of memory controller each has multiplexed pipe to every core in its block. The idea is data gets pipelined between the processor and memory, requests for start of transfer and end of transfer only changes the selector on the multiplexor. Any outstanding transfer requests get queued. Further more, data gets cached which the cores have direct access to.
. . . thousands of cores are less than amusing . . .
Schroedinger's Brexit: The UK is both in and out of the EU at the same time!
I mean, even your average plod starting Outlook on Vista, is starting dozens of fancy visualisation thingies, anti-spam algorithms, networking things...
Properly programmed, it can be torn apart in hundreds of tasks. It is not going to speed up your terminal window, no. But does your terminal window need speeding up?
10 ?"Hello World" life was simple then
is find out how to program that. I'm a programmer and I know the problems that are involved in (massive) parallel programming. For a lot of problems, it is either impossible or very hard. See also my essay 'Why does software suck' (dutch) (babelfish translation).
-- The Internet is a too slow way of doing things, you'd never do without it.
Games are already so suspiciously inefficient at managing the hardware they run on in order to help hardware companies push their newer products. It's going to be fun to watch games in the future somehow slow a 1000-core cpu to a crawl on the low detail setting, to help sell the 2000-core models.
They'll have an excuse if we have 3D monitors at that point, otherwise they'll just have to bullshit about particle effects taking more power (even on the low detail settings).
"When the atomic bomb goes off there's devastation...but when the atomic bong goes off there's celebraaaaation!"
posting anon so dont mod mee
what is n/t??
I sure would like to have 1000 Coors
Sure it would be nice to have 1000 cores in the CPU but it would suck if they were all 386sx 16mhz.
We are Dead Stars looking back Up at the Sky
Yes, but...
Each choice point requires a core. You WILL run out of cores; it really doesn't matter how many you have... Unless, of course, you solve THAT problem, which scores you a Nobel Prize.
Just another "Cubible(sic) Joe" 2 17 3061
I've only been programming professionally for 3 years now, but already I'm shaking in my boots over having to rethink and relearn the way I've done things to accomodate these massively parallel architectures. I can't imagine how scared must be the old timers of 20, 30, or more years. Or maybe the good ones who are still hacking decades later have already had to deal with paradigm shifts and aren't scared at all?
"Ask me about Loom"
If you have some many cores and unless they are purely crunching numbers (even then they'll have to spit something out some time), isn't access to RAM going to limit the throughput of your system?
A lot.
factor 966971: 966971
So now we have a shit load of cores all we have to do is wait for the developers to put some multi-threading goodness in their apps.... or maybe not.
The PS3 was ment to be faster than any other system because of it's multi-cores cell architecture, but in a interview John Carmack said, "Although it's interesting that almost all of the PS3 launch titles hardly used any Cells at all."
http://www.gameinformer.com/News/Story/200708/N07.0803.1731.12214.htm
64K cores is enough for anybody.
If people are writing their applications using threads, I dont see there should be a big problem with more cores. Basically, threads should be used where it is practical and makes sense and does not make programming that much more difficult, in fact it can make things eisier. Rather than some overly complicated reengineering, threads when properly used can lead to programs that are just as easy to understand. They can be used for a program that does many tasks, processing can usually be parallelised when you have different operations which do not depend on the output of each other. A list of instructions which depends on output of a previous instructions, which must run sequentially, of course cannot be threaded or paralellised. Obvious example of applications that can be threaded is a server, where you have a thread to process data from each socket, a program which scans multiple files, can have a thread for processing each file, etc.
OVER 9000!!!!!!11111one
A learning experience is one of those things that say, 'You know that thing you just did? Don't do that.' - D. Adams
Only 413, as sadly I could not afford even one of those 875 core machines, much less 413.
Stylish sheet to fix many problems in Slashdot's D3: https://gist.github.com/801524
Sure, you need to reqwrite code, but you need to do that anyway to get massive parallelism. As least occam provides the parallelism at a language level.
Engineering is the art of compromise.
If you put 1000 cores on a chip and plug it into a PC... very little would happen in terms of speedup.
What we need to know is the memory architecture. How is memory allocated to cores? How is data transferred? What are the relative costs of accesses? How are the caches handled?
Without that information, it's pointless to think about hundreds or thousands of cores. And I suspect even Intel doesn't know the answers yet. And there's a good chance that a company other than Intel will actually deliver the solution.
Sweet, Excel will be able to compute every cell in a worksheet to 65535 in parallel! What a time saver!
today is spelling optional day.
Today's CPUs are already powerful enough for almost any amount of "look-ahead" I can imagine.
What predictive process would need multiple cores to be able to do their thing?
No sig today...
I have two comments:
1) Intel is doing this because they've run out of optimizations on single core systems, basically this is the only thing they have left to preserve their market. I expect this time next year you'll see ARM SoC's with 1Ghz+ processors that draw under 1W of power and sell for under $10. These cores will be changing the low end of the market. Intel won't be able to continue to charge $50 for a processor when you get the same or better perf for 1/5 the cost. The only real advantage Intel has is that Windows XP/Vista doesn't run on ARM.
2) The Processor Company Graveyard is filled with companies that have touted parallel processing solutions that were going to revolutionize the world of computing. Parallel processing is extremely difficult, and only fits a subset of computing needs, we will need fast single processor systems for a long time to come. I wish Intel luck on this endeavor, everyone else has failed miserably.
no text.
That space intentionally left blank.
:x
Hi. I make processors. I know a lot about processors. I think a big change is coming to processors. I think you should learn to use a lot of processors. A whole lot of processors. You need more processors. Oh, and did I tell you I make processors?
Which of your daily compute tasks is bogging down and could use a boost from multiple CPUs?
Messenger? Word? Email?
I'm guessing "none of the above".
Games ... up to a point. Today's games are already pretty realistic on single/dual cores and the work is already being moved to dedicated CPUs (eg. graphics cards) leaving the CPU mostly idle.
Maybe you compress a lot of video. That's the only thing which could really benefit, but that's hardly a common task.
No sig today...
SGI and or Cray were using NUMA a decade ago.
Only the State obtains its revenue by coercion. - Murray Rothbard
Java takes advantage of multi-threaded functionality extremely well, and the API simplifies things quite a bit. To suggest that we need to rethink the way we program computers may just be a personal issue of yours.
I like lisp a lot actually; I think it's fun, so don't get me wrong.
However - the way the lisp is written is merely a style, which can be adopted into any imperative language as well. Unless you're talking about dynamic, self-altering code (which is a totally different subject than concurrency) there are no distict advantages to the lisp compiler that cannot be utlized in java.
I don't think it's a technological limit, but rather an economic one; lack of competition in the high-end CPU market is why you don't see clocks like that. AMD simply have nothing to offer as competition in that domain. There is little doubt in my mind that Intel is capable of making 3.8Ghz and even 4Ghz CPU models, because many people have overclocked the newer 45nm dual-core chips to such levels without much hassle. If memory serves, up until around 3.5Ghz you don't even need anything more than the stock cooling system!
Call Ripley's!! Intel pushing for a future in which they can sell more silicon!
I like turtles.
http://www.youtube.com/watch?v=CMNry4PE93Y
Can't they just make the existing ones go faster? Seriously, if I want to start architectures around 1000's of independent threads of execution, i'd start with communication speeds, not node count.
It's already easy to spawn thread armies that peg all IO channels. Where is all this "work" you can do without any IO?
I think Intel better starting thinking of "tens, hundreds or even thousands" of bus speed multipliers on their napkin drawings.
Aside from some heavy processing-dependent concepts (graphics, complex mathematical models, etc) the world need petabyte/sec connectivity, not instruction set munching.
but no programming languages or tools to take advantage of them. Most software is only written for one core. Very few if any support even dual cores.
I'd much rather see quantum computing become a reality instead of seeing a thousand cores and no way to make use of them all.
Remember, Slashdot does not have a -1 disagree moderation, and no, troll, flamebait, and overrated are not substitutes.
Databases provide a wonderful opportunity to apply multi-core processing. The nice thing about a (good) database is that queries describe what you want, not how to go about getting it. Thus, the database can potentially split the load up to many processes and the query writer (app) does not have to change a thing in his/her code. Whether a serial or parallel process carries it out is in theory out of the app developer's hair (although dealing with transaction management may sometimes come into play for certain uses.)
However, query languages may need to become more general-purpose in order to have our apps depend on them more, not just business data. For example, built-in graph (network) and tree traversal may need to be added and/or standardized in query languages. And, we made need to clean up the weak-points of SQL and create more dynamic DB's to better match dynamic languages and scripting.
Being a DB-head, I've discovered that a lot of processing can potentially be converted into DB queries. That way one is not writing explicit pointer-based linked lists etc., locking one into a difficult-to-parallel-ize implementation.
Relational engines used to be considered too bulky for many desktop applications. This is partly because they make processing go through a DB abstraction layer and thus are not using direct RAM pointers. However, the flip-side of this extra layer is that they are well-suited to parallelization.
Table-ized A.I.
The Amiga had a 68000 chip for the main CPU but had custom processors for the graphics, sound, and I/O and AmigaDOS/AmigaOS was built around it.
We have come full circle now with dual core and up chips and the GPU being built into the CPU now, back to the Amiga, which was a superior system design.
The OS will have to be rewritten to support all of the new cores and special built in GPUs and other features. Windows, Linux, and Mac OSX need to become more like AmigaOS. Small in memory footprints and able to handle multiple processors.
Remember, Slashdot does not have a -1 disagree moderation, and no, troll, flamebait, and overrated are not substitutes.
Back in 2002 when I was working for a software company that was using OCR on hundreds of thousands of images, I was pushing clustered computing. I had an engineer (not one of ours) tell me that it would probably never be practical to develop software to take advantage of multiple processors. I wonder what he would say today.
Never leave a dead horse unbeaten!
This is an effective method as long as the processor is able to manage its load properly internally.
i.e if a processor has say 100 cores..with a combined processing capacity per unit time of Z and there are X threads and the processing capacity of 1 core per unit time is Y XY must always equal Z. The challenge is how do u manage Core loads within the CPU, if Intel can solve that uber multi can really take off.
the googol core.
There are 10 types of people in the world. Those that understand this sig, and those that beat up people who do.
recently Intel came out with its Atom core. It is going into all sorts of things because it is smaller (1/10th the size of a normal core)and it draws a lot less power. It also has about half the power/cycles of some of the bigger cores. Yet this is more than enough for the normal user (/.ers excepted) All the normal people (my family) want to do is to surf the web, check email, watch movies, stuff that even Damn Small Linux can do. So it kind of begs the question: if a smaller and less powerful processor is selling so well, what kind of sales could we expect from something with a thousand or more cores?
And unganged mode access too.
(i.e.: AMD Phenoms have dual channel memory controllers too. But those don't function as dual channel to boost 2x the bandwidth, but instead function as 2 independent controllers to help more tasks access memory at the same time)
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Occasionally, you'll also see it written as: nt, NT, no text, $, (cent symbol), (euro symbol), (British pound symbol), etc. Generally, though, it should be placed at the end of the subject to tell people not to bother with opening the whole message.
Just as I started learning x86 assembly. I'm already about 20 hours into it and I'm not about to stop, but it may prove to be a waste of time 10 years from now.
You're nothing; like me.
Thousands of Gradius fans crying in agony!
The only thing new that Intel brought to the table with this press release is the attempt to fool us into believe that {...} Intel is somehow innovating in some aspect or another.
The big innovation, according to Intel is that :
- those Intel's manycore chips actually use x86 ISA. And thus can be used standalone, as main processors. Whereas current GPU are rather special architectures. One can use them for special purpose computations. But one can't get the OS to run of them. (most of the current GPU have limited branching abilities and completely lack any function calling capabilities beside what is possible by in-lining.
- another argument from intel is that, because the x86 ISA is so much more popular, it will be easier to develop and the learn to use manicore chips (with everything looking much more like what it was on the desktop), than today's GPGPU which requires special libraries and special languages.
Whether these are right is non trivial question best left to the reader's discretion.
Face it: the age of the "CPU is the computing muscle" is long gone.
Well, at least until the next turn of the Wheel of reincarnation
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Is it bad that my first thought when I saw this was: "But, my code already generates thousands of cores..."
But with 1,000 cores perhaps Microsoft could speed up Vista or Windows 7 so that 498 cores are devoted to Aero, 498 to DRM, and the last four could be used for work.
Q: What does the "B." in Benoit B. Mandelbrot stand for? A: Benoit B. Mandelbrot
Azul Systems already ships a 54-core chip with systems up to 864-way SMP. Not quite thousands, but getting close. Today.
Their page
What's different this time may be that nobody else has anything better. Last time, AMD64 was the easier solution, and it clobbered Itanium. Can AMD (or anybody) simply choose to keep making single cores faster, or is multi-core the way CPUs really must go from here?
Anyone else find it odd that Apple is focusing it's next release around multi-core processing.
AMD has this thing called NUMA. What do you think "HyperTransport" means?
I assumed it was just meaningless marketing jargon, like Sega's "Blast Processing".
They're all 'or cores'. Great, I'm always thinking "boy I wish I could do thousands of OR's at once."
because this IS the main selling point on any desktop computer
Software bloat will increase to fill the available hardware and we will be in the same boat.
---- Booth was a patriot ----
And how are we supposed to feed all of these cores? What kind of memory and I/O bandwidth are we talking about? Cache sizes?
I think Sun's "Niagara" chips are actually a head of the curve on all of this, and while Sun may have many issues, designing systems with lots of I/O is not one of them.
Sort of like the talk about how Itanium was going to require new compiler designs etc etc. Look how that turned out.
I'm surprised no one has mentioned pure functional programming. It is 'side-effect free' so you can take a block of code and drop it on any core. This is the future.
Also, look at Google's Map/Reduce design. A great number of problems can be re-expressed in terms of map and reduce.
http://en.wikipedia.org/wiki/XMTC ?
The more cores we have the better. Provided that we can supply memory bandwidth to the device.
With 1024 cores, that is definitely going to bottleneck an external memory bus. So put the memory on the chip, instead. That would certainly be a lot fewer cores. You either have smaller boards or more chips on the board.
Next, a system on a chip (SoC) complete with 3-D accelerated graphics. And you wonder why AMD bought ATI.
now we need to go OSS in diesel cars
Despite the huge push towards writing programs optimized to control their own threading, I still cannot comprehend why this is not left to the kernel tasker.
If the kernel can manage process time on one core, I feel that scaling this to work efficiently over a number of cores, with the use of semaphores to protect data and control thread access would allow for a much more efficient system level approach.
My experience controlling data access between multiple threads has been riddled with unneeded tweaking. I think that gcc should insert code to control memory leaks and process safety and the kernel should be in charge of tasking between cores.
fork() that intel.
Communism, its a party!
How much would someone bet that those will follow the very same restrictions that current GPUs have when they're used a stream procesoors? There aren't 10,000 ways to make parallel processing efficient.
If they don't put restrictions on when and how a program can use resources, simultaneous access to the memory by those cores would be a real nightmare to design, and worse to program. The best to currently use multiprocessing is by using GPGPU techniques, _because_ of those restrictions that make it possible to keep the GPU running without waiting too much on memory
May I refer you to: http://tech.slashdot.org/article.pl?sid=08/05/31/1633214
Stream processing has many more applications than games or scientific computing, Intel is seeing that. But it seems like Nvidia is way ahead in that race... Let's see if Intel will take the lead.
and now they're bringing it back?
we all learned how 1000 cores doesn't matter if each core can only process a simplified instruction set compared to 2 cores that can handle more data per thread.
this is basic computer design here people.
They're using their grammar skills there.
By definition, isn't a core just the middle/root of something? if you have more than 1 core, shouldn't the term really be changed to reflect something closer to which it represents?
----- Concentrate on promoting more than demoting.
Is that the number of cores or how much heat it dissipates?
idle at the same speed...
motherboards will be equipped with an extra 128mb chip just to keep /proc/cpuinfo.
If he's lucky, he might get his hands on a few dual core/processor machines.
One of the main complaints about AI and Comp Neuro is that the brain is a massively parallel system....this sort of thing could open up all sorts of possibilities for more realistic brain simulation. As someone going into these fields, this got my attention real quick. I actually could use a beowulf cluster of these...
There is more to science than physics!
www.iomalfunction.blogspot.com
There, fixed it for you.
WTF am I doing replying to an AC at 5 A.M on a Friday night?
it seems Intel is pushing for a massive evolution in the way processing is handled.
It seems more like intelligent design to me. Intel isn't leaving the technology to morph on it's own. They are actively designing it. Technolgy doesn't evolve. It is designed, with changes implemented on purpose.
With that many cores, you still have to wonder, is it Vista ready?
When a thousand core cpu comes out, Vista should be ready for the desktop.
http://www.accountkiller.com/removal-requested
I don't think so. We're seeing decreasing returns in multi-core computing because it is still basically multi-CPU computing and many tasks are not easily parallelized. The notion that some revolutionary compiler or IDE is going to solve this problem is just wrong. Tell it to Itanic, that was based on exactly these assumptions and failed miserably because of them.
There are also serious problems with I/O with lots of cores. How do you feed them all? It seems like you'd need a LOT of very fast memory and interconnects, as close to the CPU as possible. I think the only way to get this to work would be to have embedded memory for each core IN ADDITION to duplicate system memory. Possible, but extremely expensive.
Just look at Vista. M$ must have been planning waaay into the future when we'll have a million cores.
The Nintendo DS has a dual core ARM CPU.
Although they are all turing machines in the end, a functional programming language can separate the program from the implementation. You don't think in terms of contexts, a single program may use an arbitrary number of cores, in a very lightweight and low context method.
Each one effectively removes a to-be-evaluated expression from a list, and returns its evaluated result. In the process of evaluating the expression, it may add other to-be-evaluated expressions to this list, which will be evaluated by available cores. When the list is single valued, the 'program' is complete. The idea is that the system, rather than the programmer, stretches to the available parallelism. It won't help getting the last element of a list, but an entire list can be searched in parallel, presuming the comparison is more expensive than the traversal.
When a core has completed a given expression, its context is dead, or, in other words, the expression is the context.
two problems: 1. functional programming is hard. I know the math nerds will say its natural, which is true if you happen to be a math nerd. 2. most functional programming languages are about as intuitive as windows registry.
maybe we can ask Misters K&R to write an "F" language to do for functional programming what C did to the abyss of COBOL, FORTRAN, PL/I, ... just don't let Mr S near it.
He meant 640k CORES should be enough for anybody.
So how do you handle backwards compatibility? Let's say I have an application that runs just fun on a single core 1ghz processor. And it's not designed to use multiple cores. Now all the sudden I have 1000 100mhz cores, and my app is only designed to utilize a single core. Do you obscure parallel processing at the hardware level? ie, provide a logical "primary" processor capable of delegating tasks to it's minions?
GPUs already use massively parallel processing. If you think of the 800 stream processors in the rv770 core as individual cores, then you have your hundreds of cores.
I think Intel misses the fact that in other segments of the industry, their "new shiny ideas" are old hat.
Remember when the Pentium (1) brought strong floating point to the consumer market. Look back people, Alpha and sparc had the importance of FP down years before Intel really figured out what was going on.
Lastly, will I be overclocking cores by the batch or do I still have to see what each core will max out at?
What's so 'massive evolution'ary about that?
Only a few years late to the game I guess.
_
\\/ are accustomed' - First Lensman
This is one of the things I noticed about working with CUDA (the general purpose computing API for Nvidia GPUs). There's a bit of extra complexity in the planning stage (you have to be really careful managing resources, or most of the chip will sit there doing nothing), but it also tends to eliminate a couple levels of nested loops and handle a lot of indexing / addressing type stuff automatically.
Is it only me, who wondering about wafer process they are going to follow? Wouldn't that better to increase cache die area instead of cores?
When the UMPC's started coming out using the atom processor, a few things really stood out to me.
It seems to me that the die is very small, physically, and it is obviously a low power consumption and low heat chip.
It also isn't all that fast.
But what if you had, like, 10? 100? 1000, like the TFA says? NOW we're talkin.
Flappinbooger isn't my real name
Why did the multithreaded chicken cross the road?
to To other side. get the
Education is the silver bullet.
I'm guessing it doesn't have the "Wow!" factor you were looking to get.
It wasn't for wow factor as you presuppose.
Also, a hell of a lot of the world runs synchronously and if it didn't some very, very bad things would happen.
Sure, but there are a hell of a lot of things in this world running synchronously, in parallel. And many of these separate synchronous events can be influenced by external forces. Nothing exists in a vacuum.
My computer is a tool, like my desk. I'm rarely working on one thing at once, multiple projects across multiple disciplines. Why should my computer be focused on one task? It shouldn't. It should be spread across many tasks. To do this, it can incorporate many parallel processors.
Now take it in context, among other things I write and work with engineering codes for large clusters, modeling and simulation of real-world phenomena. I work with parallel distributed code on a daily basis. This isn't wow factor for me, this is daily life.
The point is not that it will be faster for highly parallel tasks, but that it is becoming dificult to increase throughput by lowering latency. Intel is looking into multiple parallel processors as a way of increasing throughput, and attempting to develop software design to a point where it is useful.
What do cores matter if the IDE doesn't keep pace?
It wants its systolic systems, its thinking machines and other parallel architectures back.
I think that Intel is making a mistake here by calling upon programmers to solve the problem.
It is them who should be making their cores available in usable hardware architectures, but maybe they suffer from NIH, because all worthwile parallel architectures already exist.
No one will ever need more than 640 cores.
I'd say that it could have a rather hefty impact on the graphics industry (though to be fair, both tend to share tech fairly regularly as it is) as well as many others.
How about servers? If you have 1000 cores, and 1000 clients connecting through the network, then each core could service a client (though depending on what they're doing, IO and other issues also rear their heads). Another nice aspect would be that if you could fix a process to a certain # of cores, you could always be sure that it wouldn't max out your entire CPU capacity.
CPU speed is far outstripping bus and memory bandwidth
One the the issues I'm continually faced with at work is not so much CPU horsepower anymore, but disk IO. Even with a good RAID setup, there's only so many clients you can service off a single machine at a given time. Removed storage capabilities like iSCSI and other forms of storage arrays can help this, but I'm not sure that even those are ready for 1000-core machines running as superservers.
I'm running Vista, and DWM (aero) and every background task including your DRM boogeyman (as if it did anything at all when not playing back protected media), uses like 1% of the CPU, so why don't you shut up and learn something instead of spreading your idiotic FUD.
I am not a programmer, just a lowly MS Servers/Photoshop/Photographer/inactive biz atty guy. However, that being said, to me, these webcasts are quite good... particularly the 4 part parallel computing lecture series. It clearly breaks down the problem from the "computationally and parallellizably trivial" to the real and very hard challenges in problems that are extremely difficult and complex to solve ... the lectures are by a master of these issues and of the domain; Geoffrey Fox.
kellybundy@operamail.com is my postable email. (checked only in rare, comatose, delusional spam-loving moments)
Anonymous Coward
------------
begin links:
------------
Technical Computing @ Microsoft: Lectures Series on the History of Parallel Computing - Part 1
Geoffrey Fox, Ph.D., professor, Computer Science, Informatics, and Physics at Indiana University
February 26, 2007
http://www.researchchannel.org/prog/displayevent.aspx?rID=11073&fID=569
Technical Computing @ Microsoft: Lectures Series on the History of Parallel Computing - Part 2
Geoffrey Fox, Ph.D., professor, Computer Science, Informatics, and Physics, Indiana University
February 27, 2007
http://www.researchchannel.org/prog/displayevent.aspx?rID=11071&fID=569
Technical Computing @ Microsoft: Lectures Series on the History of Parallel Computing - Part 3
Geoffrey Fox, Ph.D., professor, Computer Science, Informatics, and Physics, Indiana University
February 28, 2007
http://www.researchchannel.org/prog/displayevent.aspx?rID=11070&fID=569
Technical Computing @ Microsoft: Lectures Series on the History of Parallel Computing - Part 4
Geoffrey Fox, Ph.D., professor, Computer Science, Informatics, and Physics, Indiana University
March 1, 2007
http://www.researchchannel.org/prog/displayevent.aspx?rID=11069&fID=569
The Stanford Data Stream Management System
http://www.researchchannel.org/prog/displayevent.aspx?rID=4355&fID=569
Parallel Execution Models for Future Multicore Architectures
Guri Sohi, faculty member and chair, Computer Sciences Department, University of Wisconsin-Madison
February 17, 2006
http://www.researchchannel.org/prog/displayevent.aspx?rID=4793&fID=569
SaC: Off-the-Shelf Support for Data-Parallelism on Multicores
Dr. Sven-Bodo Scholz, senior lecturer, University of Hertfordshire
March 30, 2007
http://www.researchchannel.org/prog/displayevent.aspx?rID=11269&fID=569
http://www.researchchannel.org/prog/displayevent.aspx?rID=24404&fID=569
programmers.Stream Programming: Luring Programmers into the Multicore Era
New capabilities with parallel abstraction that simplify application development, becoming more appealing to
The Center for Parallel and Distributed Computation
http://www.researchchannel.org/prog/displayevent.aspx?rID=2380&fID=569
The Google Linux Cluster
Infrastructure of Google web search
http://www.researchchannel.org/prog/displayevent.aspx?rID=2879&fID=569
-----
end links
-----
At the moment my CPU usage never goes much above 50% when compiling no matter how many threads I tell it to spawn.
This suggests I'm either I/O bound (on one of those new-fangled Velociraptor drives...) or stalled because of build dependencies.
I've only got two cores at the moment. Adding more wouldn't necessarily speed up my compile times. YMMV.
No sig today...
... idle process counting thousands of processor-time seconds PER SECOND. Cool!
This is modded as funny now, but this is almost CERTAINLY where we are headed. The computing world moves fast, and certain new fabrication techniques may move it even faster..... I can see a near future where we have a million quantum cores in our cell phones, and the term super computer literally does not exist any more in any meaningful way, because every computer has twice the computing power of all computers currently existent. I mean, that has happened already since the days of ENIAC, has it not?
Now what is MORE interesting to me is that at some point in the near future I think computer OSes will operate like a cloud and instantly turn all processors (which will all be multicore, eventually), within in wifi-z or whatever we have at that point into an instant cluster computer. It's hard to think that far ahead of the current box, and envision exactly how applications and interfaces will work at that point, but I still don't see that as necessarily funny, so much as interesting....
Anyone remember the 10 Deca chip? Blazing fast, but...So hot it burns up.
Niagra2 is 8 core, and 4 way SMT per core. Given that each core has multiple functional units, its very close to being a fully 32 way cpu. It feeds on four dual channel memory pipes. The servers running these dont need special software to make use of these cores, they just handle lots and lots and lots of user requests per second. For the most part they're webservers and fileservers, but they'd almost certainly make excellent mainframes for large multi user environments/virtualized systems. All it takes to use this multi core cpu is a dozen 5-watt 166mhz thin clients.
Besides multi user environments, theres already plenty of data parallel tasks that can use however many cores you give them. Graphics is just one, but neutral nets, signal processing, simulation tasks of all kinds, these tasks can all parallelized to any degree of parallelism you can build. 1000 will be laughably small by 2010; already today you can buy an 800 core cards in Best Buy: just ask for an AMD 4870 and pay your $200.
Multimedia apps are resource hogs today and this stuff will easily eat up all 1000 of cores. Yes, RAM is bottleneck but sound and video processing may be parallelized easily. Voice and image recognition, cryptography, voice synthesis, neural networks, ....
A whole lot of modern tasks may utilize far more then today's 4 cores.
Over 9000? WTF! http://www.youtube.com/watch?v=VJerOY0xqIw
Here be signatures
I sure haven't used any... nor seen any... Is there a list somewhere of software that actually can use more than one core? Just once I'd like to see it in action. None of the software I use will escape a single core in my X2.
You are going to need one bad ass graphics and display setup to view them in task manager
I would think 64,000 cores to be more than anyone would ever need.
We'll probably need 50 cores just to run the next version of Windows!
The problem with thousand of cores is not hardware related. Yes, Intel can build thousands of core by continued innovation in the hardware. However, the software community is still struggling to understand how to use even 8 cores. The concurrency issues are enormous. For example look at the following post: http://kashi.webhop.net/blog/Technology/index.php/archives/22 Till we solve the concurrency issues the increased number of core will not achieve any higher performance. The C++ is just starting to deal with concurrency. C++0x adds threads support however, that is just the starting point.
If you use mutexes, semaphores, critical sections etc then parallel programming is indeed hard. These low level primitives should only be used for coding parallel APIs, where the user never sees them.
One such example is the Actor model: objects communicate through messages. It doesn't get simpler than that, and the low level communication primitives are only used at the message queue of each object (because one thread writes the queue, one thread reads it).
Imagine a program written in an object-oriented language where each object is an Actor! if you have 10,000 objects, and 10,000 cores, each core can represent one object. Sequential algorithms could then be parallelized automatically, without being re-written, since a method call becomes a message...there are solutions for parallel programming, it's just that the major programming languages define a culture that is hard to break.
For a successful application of the Actor model in the industry, you can check the programming language Erlang used in telecommunications.
The point is that this is going to happen, whether anyone likes it or not.
CPU clock speeds ran into the brick wall a few years ago. Here is a chart showing CPU clocks from 1993 to 2005.
There have been no major performance improvements from that direction for the last few years, and probably won't be any more without a major breakthrough in semiconductors.
Moore's law is about transistor counts, and shows no real signs of stopping. Every 18 to 24 months, we double the number of transistors on a given wafer/die. The transistion to 64 bit CPUs used a generation or two of those extra transistors, but we aren't likely to move to 128 bits soon. We are already pretty deep into the diminishing-returns curve for on-die cache.
What is left to consume those transistors?
More cores. Lots more cores. If you replace your CPU every 2 years, you can pretty much bet that each one you buy for the next decade or so will have twice as many cores as the one it is replacing.
And if developers and compilers get good at managing parallel code (and they have no choice in this), you can expect core counts to go up even faster than doubling ever couple of years.
See that "Preview" button?
So far Intel's processors got 4 cores maximum. PS3's Cell got 8 cores, and Cell is not news for a long time.
A little less conversation, a bit more action please Intel.
"even thousand of cores"
Finally, someone plans to build a CPU that can run Vista!
Tesla chips are ready to ship from NVIDIA (240 cores per chip).
Intel talks up vaporware. Tells to prepare for dozens of cores.
If this is Intel hinting at future products, this would explain Apple's new "Grand Central"...Apple knows about the upcoming super-ultra-parallel chips?
Grand Central is a Snow Leopard feature to make it "much easier for developers to create programs that squeeze every last drop of power from multicore systems."
http://www.apple.com/macosx/snowleopard/
To all developers with a problem: make your problem match our solution!
At least Sun got the idea right with the Niagara based processors. You have a problem: a high load web or database server which is inefficient. Ok, what are the problems 1) IO, 2) energy usage 2) SSL performance. Ok, here you have a CPU with many cores to make sure the IO is saturated, bold on two 2 10 Gbps NIC's and make sure it is nicely under clocked. Add crypto systems to speed up SSL (and change the crypo API in Java to make it work - whoops, almost forgot that part).
Well, that was a very direct approach to solving a problem.
I take exception to that statement.
Video compression, and media processing in general, can scale up to 1000+ parallel threads, although current apps will need to be re-architected. I regularly have my 8-core workstation tied up for 24+ hours doing media processing, so this sounds really good to me!
Current compression products (Rhozet's Carbon Coder is the biggest example) can already scale up happily to 16 and 32 cores.
My video compression blog
I imagine people were having these same types of discussions when we maxed out: electromechanical, relay-based, vacuum tube, and transistor based technologies. The problem is that Integrated Circuits are over. We'll make the jump... Parallel computing is fine, and certainly useful, but one day soon (er than you think) you will have 1 core that is doing 100 PetaFlops with no heat consequences. And if you still wanna put 100 of them on a chip, that's fine too.
With the world looking at ways to lower energy consumption, our industry is retarded if we're going to keep pushing to higher and higher CPU core numbers, higher and higher power consumption, etc.
Is this just the industry's way of giving up and realizing they can't get control of so-called "software engineers"?
Put some leashes on some people, measure PERFORMANCE of code again like we did when we didn't HAVE massive CPU horsepower, and actually work hard on sysadmin goals like properly prioritizing processes running on the hardware?
Think any of that would help a whole lot toward being able to close down a whole lot of data centers?
Won't happen though -- we're humans. We want it now, we want it fast, and we don't care if we have to leave a steaming pile of shit in someone's yard to accomplish it!
Create non-crappy code (even if you have to get underneath the compilers and high level languages to do it) that do the core things people need REALLY WELL perhaps, and say to hell with buying more and more cores from Intel?
I know it's a pipe-dream at this point. Multiple generations of coders haven't analyzed their code for speed/CPU efficience in almost two decades now. Those folks will never learn how, either. No business motivation to do so.
(Hint: Stop buying hardware and make people use what they have for a while, fall back off the leading edge and wait a bit. Those ideas/words scare Intel to death. They HAVE to sell you "more cores!" or "more GHz!" or "more MIPS!" every year to stay in business, now don't they?)
+++OK ATH
This is a huge problem already.
Our standard Unix box at work is a Sun T5240 which has 16 cores and 128 threads. We just bought a bunch of expensive S/W (7 figures) of which some is licensed for 2 cores. It is getting very hard to buy a 2 core server of any sort - Intel or Unix.
Larry
finally figure out a way to program them that's practical.
You haven't heard of Erlang yet?
you had me at #!
I can't speak for the others, but it's certainly true that Erlang can be used in declarative ways, as its function signatures are patterns which are matched and bound at runtime. Idiomatic Erlang is therefore much shorter then ordinary imperative code (Java, C, ...), some people have estimated by a factor of 4-10.
For an example of declarative style, see my simple minded Tic-Tac-Toe Erlang web application - for example, ttt.erl.
you had me at #!
Erlang may not end up being 'the' massively concurrent language of the future, but it's arguably the closest thing by far, that we have today. The shift in thinking that it involves will conceptually prepare you very well for a C-core, K-core, M-core future. A properly architected application will transparently scale.
you had me at #!
but no programming languages or tools to take advantage of them.
You expect that to come in the CPU box? Good tools exist, but you will have to learn how to use them.
you had me at #!
It's already a complete waste of time. Real work is done at HLL or VHLL level.
you had me at #!