Intel Talks 1000-Core Processors
angry tapir writes "An experimental Intel chip shows the feasibility of building processors with 1,000 cores, an Intel researcher has asserted. The architecture for the Intel 48-core Single Chip Cloud Computer processor is 'arbitrarily scalable,' according to Timothy Mattson. 'This is an architecture that could, in principle, scale to 1,000 cores,' he said. 'I can just keep adding, adding, adding cores.'"
Imagine a beowulf cluster of ... ah yeah.
I hope he never works for Gillette.
Sometimes, life itself is sarcasm...
From the article: "By installing the TCP/IP protocol on the data link layer, the team was able to run a separate Linux-based operating system on each core. Mattson noted that while it would be possible to run a 48-node Linux cluster on the chip, it "would be boring."
Huh?! Boring?! It would have been a nice a first post on Slashdot on the eternal topic - does it run Linux? - to begin with.
The we have all the programming goodies to follow up with.
...as long as you don't mind that it does nothing useful because of off-chip bandwidth starvation. I fail to see anything in TFA that suggests that problem is solved
Are they trying to reinvent Transputer? :)
But yes, I am happy to see Intel pushing it forward!
Paul B.
This is for server/enterprise usage, not consumer usage. That said, it could scale to the number of cores necessary to make realtime raytracing work at 60fps for computer games. Raytracing could be the killer app for cloud gaming services like OnLive, where the power to do it is unavailable for consumer computers, or prohibitively expensive. The only way Microsoft etc. would be able to have comparable graphics in a console in the next few years is if it were rental-only like the Neo-Geo originally was.
Corruption is convincing someone that the selfless ideal is the same as their selfish ideal.
http://xkcd.com/619/
10 FOR N = 1 to 1000
20 PRINT "Cor!";
30 NEXT N
Would the temperature raise 1000 times more than now?
(Would we need cryogenic coolers?)
Imagine a Beowulf cluster of th^H^H^H
Ah, forget it, the darn thing practically is one already! :/
"Imagine exactly ONE of those" just doesn't sound the same.
Coolest thing, looking at the front panel you could judge your and others code by the way the led bar graphs lit up. Bunch of vertical, and good code. Lots of horizontal, and a lot of comm without much computation going on. Of course, with a little creativity, the display lent itself to cool xmas displays, etc.
http://en.wikipedia.org/wiki/Intel_Paragon
when it coming to market?
warning pointless sig
Why have 1000 cores when you can have 1 MILLION CORES, (all running applications that can barely take advantage of 1 or 2)
Just how small does your penis need to be to need a 1,000 cores? Are we talking ingrown like with monster trucks or just really small? I render CG animation so a 1,000 cores has a practical use but for most we have to be talking bragging right here. I mean having a 12' tall Toyota Hilux or a 1,000 core computer has to be BYOV, Bring Your Own Vibrator time.
Having been in attendance of this presentation at Supercomputing 2010, for once I can say without a doubt that the article captured the essence of reality. The only part it left out is that the interconnect between all the processing elements uses significantly less energy than that of the previous 80-core chip; I think the figure was around 10% of chip power for the 48-core, and 30% for the 80-core. Oh, and MPI over TCP/IP was faster than the native message passing scheme for large messages.
"It's a lot harder than you'd think to look at your program and think 'how many volts do I really need?'" he [Mattson] said.
First was RAM (640kb should be... doh), then M/GHz, then Watts, now is volts... so, what's next?
(my bet... returning to RAM and the advent of x128)
Questions raise, answers kill. Raise questions to stay alive.
Am i the only one feeling this is just a foray into multicore chips because they hit a brick wall when it comes to faster single core CPUs? While i like the thought of say 8 cores or something id much rather have those 8 cores being faster than having a frigging supercomputer under my desk.
HTTP/1.1 400
Er, yeah, pretty much everyone knows they have no practical way to make the clock speed much faster. The only thing they can do is proliferate cores beyond all reason. Nobody has the slightest idea how to take advantage of that many cores in normal household use and even most workstation use.
Intel would not have (presumably!) to re-invent *Intel* Paragon! :)
We can throw a Connection Machine in there, and really date ourselves -- but it's still nice to know that finally CMOS tech has caught up with late 80s comp. arch. advances!
And then, do not get me started on the original Tera, with its multithreading it seemed to be much better bang for the buck of chip real estate than currently accepted multicore solutions. But what would I know...
Paul B.
Again...
Alternatively, NUMA on a single CPU (different memory channels connected to different cores).
It would be a bitch to program (but fun nevertheless).
given that for years GPU's have hand hundreds of processors (the power of CUDA is awesome!) this is long over due by lazy CPU designers like Intel....
I took an intro to ECE class last fall that was basically just a parade of people coming in and talking about the kinds of things that they do as an engineer. One of the speakers talked about how one could have all of these cores, but that coding to take advantage of all of them was such a difficult task that it's hard to find any software that takes advantage of the few cores we're shipping today, let alone a hundred cores or a thousand cores. Apparently he was working on a project - a sort of wrapper? I think he mentioned AI but I don't know if he was just blowing smoke up our ass at that point - to help streamline writing for thousands of cores. I don't know how much truth is in that but I found it interesting, and would love to hear from someone who actually codes these kinds of things.
with all that heat, it would be nice to have a skillet that could cook a samwitch or eggs or brew coffee. I lived on a Mr. Coffee machine for over 3 years of boiling vegetables or tea and my only regret is it while keeping the room warm and the occassional hot towel bath it would have been nice if it's heat source was from an embedded computer rather a wastefull heating element. I know some people used a self-throttling Pentium 4 to boil food from their waterblock and such. Why not?
Make a processor with four asses.
This just goes to show that if you care about having a future career (or even just continuing with your existing one) in programming, Learn a functional language NOW!
I dream of a nation where a man is not judged by his skin color but by an number assigned by a credit rating agency.
Would be interesting to know if this helps with performance/power ratio against (potential many core/cpu) ARM servers.
1000 cores on a chip isn't too bad. I already have one with 110 cores.
That's only 10 more cores!
I wonder how the inter-core communication will scale without packing 1000+ layers in the die.
Maybe Computers will never be as intelligent as Humans.
For sure they won't ever become so stupid. [VR-1988]
"Performance on this chip is not interesting," Mattson said. It uses a standard x86 instruction set.
How about developing a small efficient core, where the performance is interesting? Actually, don't even bother; just reuse the DEC Alpha instruction set that is collecting dust at Intel.
There is no point in tying these massively parallel architectures to some ancient ISA.
This be called a coretex, mr. data, or possibly a cluster-fuck
Probably in future 1 million cores is minimum requirement for applications. We will then laugh for these stupid comments...
Image and audio recognition, true artificial intelligence, handling data from huge amount of different kind of sensors, movement of motors (robots), data connections to everything around the computer, virtual worlds with thousands of AI characters with true 3D presentation... etc...etc... will consume all processing power available.
1000 cores is nothing... We need much more.
Peter: By gluing many razorblades to this ordinary desk fan, I'll save time in my morning routine!
*Peter turns on the fan and moves it closer to his face as the camera changes to a view of the wall through a window. Peter screams, blood spatters on the wall.*
Peter (offscreen): Lois, I done it again!
And here's crappy youtube link to that scene. No idea how long it will last: http://www.youtube.com/watch?v=CKHY4OsAPc8
Okay, I'm sure some high-end consumers would benefit from this, I think the majority of consumers will not. The number of multithreaded programs on my Windows computer can be counted on one hand I think. Java being the major one, if and only if the programmers want to program multithreaded.
At this point in time I'd rather have a dual core 3 GHz processor than a quad or octa core 2 GHz processor.
It's an interesting machine. It's a shared-memory multiprocessor without cache coherency. So one way to use it is to allocate disjoint memory to each CPU and run it as a cluster. As the article points out, that is "uninteresting", but at least it's something that's known to work.
Doing something fancier requires a new OS, one that manages clusters, not individual machines. One of the major hypervisors, like Xen, might be a good base for that. Xen already knows how to manage a large number of virtual machines. Managing a large number of real machines with semi-shared memory isn't that big a leap. But that just manages the thing as a cluster. It doesn't exploit the intercommunication.
Intel calls this "A Platform for Software Innovation". What that means is "we have no clue how to program this thing effectively. Maybe academia can figure it out". The last time they tried that, the result was the Itanium.
Historically, there have been far too many supercomputer architectures roughly like this, and they've all been duds. The NCube Hypercube, the Transputer, and the BBN Butterfly come to mind. The Cell machines almost fall into this category. There's no problem building the hardware. It's just not very useful, really tough to program, and the software is too closely tied to a very specific hardware architecture.
Shared-memory multiprocessors with with cache coherency have already reached 256 CPUs. You can even run Windows Server or Linux on them. The headaches of dealing with non-cache-coherent memory may not be worth it.
Ok, you can cram 1000 cores into one CPU chip - but feeding all 1000 CPU cores with enough data for them to process and transferring all the data they spit out is gonna be a big problem. Things like OpenCL work now because the high end GPUs these days have 100GB/s+ bandwidth to the local video memory chips, and you're only pulling out the result back into system memory after the GPU did all the hard work. But doing the same thing on a system level - you're gonna have problems with your usual DDR3 modules, your SSD hard disk (even PCI-E based) and your 10GE network interface.
It seem like I've been here before.
Seastead this.
The key difference between this research chip and the other Multicore chips Intel have worked on, like Larrabee, is that it is explicitly NOT cache coherent, i.e. it is a cluster on chip instead of a single-image multi-processor.
This means, among many other things, that you cannot load a single Linux OS across all the cores, you need a separate executive on every core.
Compare this with the 7-8 Cell cores in a PS3.
Terje
"almost all programming can be viewed as an exercise in caching"
64 Cores ought to be enough for anyone...
Am i the only one feeling this is just a foray into multicore chips because they hit a brick wall when it comes to faster single core CPUs?
For many years (at least 5, possibly more) Intel has been telling developers that future performance gains will come from multithreading not faster clock speeds. So no, you are not the only one feeling this way. :-)
The first time was the i432 http://en.wikipedia.org/wiki/Intel_iAPX_432 Anyone remember that hype? Got to love the first line of the Wikipedia article "The Intel iAPX 432 was a commercially unsuccessful 32-bit microprocessor architecture, introduced in 1981."
The second time was the Itanium (aka Itanic) that was going to bring VLIW to the masses. Check out some of the juicy parts of the timeline also over on Wikipedia http://en.wikipedia.org/wiki/Itanium#Timeline
1997 June: IDC predicts IA-64 systems sales will reach $38bn/yr by 2001
1998 June: IDC predicts IA-64 systems sales will reach $30bn/yr by 2001
1999 October: the term Itanic is first used in The Register
2000 June: IDC predicts Itanium systems sales will reach $25bn/yr by 2003
2001 June: IDC predicts Itanium systems sales will reach $15bn/yr by 2004
2001 October: IDC predicts Itanium systems sales will reach $12bn/yr by the end of 2004
2002 IDC predicts Itanium systems sales will reach $5bn/yr by end 2004
2003 IDC predicts Itanium systems sales will reach $9bn/yr by end 2007
2003 April: AMD releases Opteron, the first processor with x86-64 extensions
2004 June: Intel releases its first processor with x86-64 extensions, a Xeon processor codenamed "Nocona"
2004 December: Itanium system sales for 2004 reach $1.4bn
2005 February: IBM server design drops Itanium support
2005 September: Dell exits the Itanium business
2005 October: Itanium server sales reach $619M/quarter in the third quarter.
2006 February: IDC predicts Itanium systems sales will reach $6.6bn/yr by 2009
2007 November: Intel renames the family from Itanium 2 back to Itanium.
2009 December: Red Hat announces that it is dropping support for Itanium in the next release of its enterprise OS
2010 April: Microsoft announces phase-out of support for Itanium.
So how do you think it will go this time?
Why is Snark Required?
How about developing a small efficient core, where the performance is interesting? Actually, don't even bother; just reuse the DEC Alpha instruction set that is collecting dust at Intel. There is no point in tying these massively parallel architectures to some ancient ISA.
Technically the cores are not executing x86 instructions. For several architectural generations of Intel chips the x86 instructions have been translated into a small efficient instruction set executed by the cores. Intel refers to these core instructions as micro-operations. An x86 instruction is translated on the fly into some number of micro-ops and these micro-op are reordered and scheduled for execution. So they have kind of done what you ask, the problem is that they don't give us direct access to the micro-op instructions set.
Intel tried to move beyond x86 with the Itanium and the market said no. The market also said no to Alpha and PowerPC, both of which had consumer oriented Windows NT 4 support. Even Apple had to give up on PowerPC and they were part of the PowerPC consortium. There is no Intel x86 conspiracy, they are trapped too.
I bet you're one of those geeks who sneers at Windows and Mac users and thinks he's really clever because he uses Linux, but is 30 years old and still works at Maplins or some other vaguely nerdy retail job.
In Soviet Russia, 1000-blade project cuts you!
Then you die.
Do 1024 cores constitute a kilocore? Or 1000? I'd love to see that debate move from hard disks to processors.
Bingo Dictionary - Pragmatist, n. A myopic idealist.
Get me as many cores as needed so Windows will stop pausing to open a folder even on a freshly formatted computer. Instant, instant instant....
Okay, I'm sure some high-end consumers would benefit from this, I think the majority of consumers will not.
As a game developer I have to say consumers could benefit. And no I am not necessarily thinking about more graphical eye candy. For example I would like to have hundreds of cores working on AI for computer controlled characters/units.
Would that be the German branch of Intel?
Apologies to all Germans reading. I just couldn't resist.
I hadn't the slightest objection to his spending his time planning massacres for the bourgeoisie... (P.G. Wodehouse)
This model does allow for 1000 times the BSOD dosage!
Isn't that Intel's pet project for the last decade?
No sig today...
Related link: Intel Says to Prepare For "Thousands of Cores"
Post the related links you fucking incompetents.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Here we are talking about 1000x cores when the best consumer and business applications can use "maybe" 10 effectively....
Im bad at the whole car analogy thing, but im sure someone out there has one that would fit here nicely
I will need to buy a pair of sunglasses, and crush them when I find that the new Intel processor has over 9000 cores.
Isn't it just a #define in the source code?
No sig today...
Obligatory XKCD ref: http://xkcd.com/619/
Because of the limited number of instructions, you have more instructions for a logical operation, e.g. multiply (although many risc cpu's have that operation), so this means you have to load more bytes from ram to do the same thing as a CISC instruction with lesser bytes than the whole piece of code for the risc. As cpu speed vs. ram / bus speed is skewed, it's more efficient to have instructions which take maybe a bit more bits, but on average they don't really take that much more and have microcode on-die to handle them, instead of having to load alot of risc instruction bytes from ram for doing basic operations a cisc can do through microcode. As long as the memory speed/busspeed is not exactly the same as the cpu speed (like on the ps3 where memory/bus runs at 3ghz, equal to the cpu) but slower, risc isn't always more optimal.
Never underestimate the relief of true separation of Religion and State.
The paper referenced in the arcticle can be found here.
Fascinating that MPI works that well unmodified.
sounds like a coarse-grained FPGA haha!
I hear a truckload of kleenex's just got delivered into Ellisons office when he heard this news.
Calling someone a "hater" only means you can not rationally rebut their argument.
There's a world of difference between massive # of regular cores--which, if harder to program for is well-understood--and the Itanium, which introduced a whole new concept with its EPIC architecture. The EPIC architecture seemed like a good idea--let the compiler take care of most of the instruction re-ordering, and get rid of branch predictions where at all possible by introducing speculative instructions in its stead. But as it turned out, writing a good compiler for this architecture is hard if not impossible...
But can you keep adding memory links? and IO links?
As 1000 cores may be cool but to make full use you may need 6-12+ ram channels and maybe 2+ QPI links. But RAM is more needed then IO some times. But if you are working with a lot of data then you may need 1 QPI link just to the SDD bank / raid system.
According to Intel it's Single-chip Cloud Computer. :P
tlax says: "Lol".
The article is talking about targeting 1000 cores per chip (in x86 made efficient by fancy translating filters that consume chip real estate worse faster than Hummers consume gas).
Man, you're insane.
And I guess you don't believe in dust. Or maybe you don't believe testing processors costs money.
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
How about we make a balance. I say a roughly logrithmic curve of processors/power of each processor ratio. Take 1 very, very powerful core, then 2 cores half the power of that, then 4 cores half the power of those, then 8, then 16, then 32, then 64, then 128, then 256, then 512. At that point, you have over 1000 cores, and have the ability to do anything you want with ridiculous speed and power, be it rendering thousands of simple tasks, or burning through a single mammoth thread, and everything in between.
Where is the mod rating for "scary"? Also,
Still have compiler writers that don't understand that unrolling code is not usually a real win, overall.
And the Itanium was designed for exactly that kind of optimization, as if a compiler is always supposed to be able to predict execution path in real-time execution.
Kind of like the time I tried to write a user interface in CoBOL.
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
err, barking up the right tree.
But they are still barking.
x86!
Marketing magic will always prevail over reality!
(That's what Moore's law really said.)
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
My laptop has 10 cores, quad (100) cores are common and have seen many servers with 1000 or more...
Encryption: I may not agree with what you say, but I will defend your right to encrypt it...
I'd call it more of a testament to how much intel's fanatacism can induce them to waste all the benefits of Moore's law supporting baggage that was unnecessary when the x86 was "invented".
Just for the marketing department's black magic.
Instruction efficiency? Compact code? There are numerous processors that wax the floor with x86 in those departments, but marketing department's black magic killed the market.
Magic? It's all parlor tricks, you know, pay a researcher here to slip a little excess code in a tight loop on that 68k "benchmark", that sort of thing. The problem with the old saw about magic being indistinguishable from advanced tech is that magic is not about real results. Magic is about illusion. The confusing point is that illusion can be turned into reality with some effort.
In the x86 case, it was a huge lot of effort justified by a huge load of hubris and the needs of the black magic department, a vicious cycle.
x86 is a significant contributor to global warming (which is part of the reason some people want to deny the reality of human impact on the climate changes).
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
has interesting ideas and it is difficult to follow them :)
IMHO the biggest problem with these multi-core chips is the lock latency. Locking in heap all works great, but a shared hw register of locks would save a lot of cache coherency and MMU copies.
A 1024 slot register with instruction support for mutex and read-write locks would be fantastic.
I'm developing 20+Gbps applications - we need fast locks and low latency. Snap snap!!!
I said no... but I missed and it came out yes.
Talk is cheap, show me the cores.
Stupidity is an equal opportunity striker.
Fellow slashdotter Bill Dog
Intel, you already have trouble keeping pipe lines full on your six core chips. You can't scale the RAM interface up to fit your 1000 core chip, I'm not saying it is impossible, but I'm saying that you as a company have shown you don't know how even given economic incentive to do so. You're talking about eliminating cache (but maybe not Icache), but that makes the memory bandwidth problem worse even though it fixes the coherency problem.
While it's true that GPGPUs have lots of streaming processors, they have a different model than x86, and standalone programs are pushed into the processors in chunks. And x86 pulls programs into its pipeline a few instructions at a time, frankly it is going to be a mess.
http://www.nexstor.co.uk/prod_pdfs/SGI%20Altix%20UV%201000.pdf
AFAIK, the largest number of cores supported in a single system image is 2048 in SGI Altix UV (Ultraviolent me droogs).
Is there hardware out there that supports more than 2K cores in a single system image and has 2.6 been tested on it?
Photoshop is STILL stuck at 2 processors... Software needs to pick it butt up... it's been lagging behind hardware for too long now.
Photoshop has been stuck at 2 processors for Way too long. Software companies have been lagging behind hardware far too long. Until I see See more software taking advantage of cores of more than 1 or 2... I'm not wasting money on them.
According to benchmarks, a functional language like Erlang is slower than C++ by an order of magnitude. Sure, it can distribute processing over more cores, which is the only thing that enabled it to win one of the benchmarks. I suspect that was only because it used a core library function that was written in C. So no, if you want to write code with acceptable performance, DON'T use a functional language. All CPU intensive programs, like games, are written in C or C++; think about that.
I always remember the Intel i860. Another attempt to create a graphics processor (or coprocessors as they were called back then). It had special instructions for perfoming combo Z-buffer and color buffer test and writes as well as vector processor instructions. They made it into early SGI workstations.
The full-page glossy advert pages of BYTE magazine used to have these pictures of really impressive (at the time) systems with transputer/i860/TMS34020 boards. Some with their own network and hard disk drive ports (the PC was too slow at the time to handle the data transfer). But every time a board came out, six months later, CPU's would have caught up and these boards/chips would become known as "graphics deaccelerators".
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
There's a good analysis over at the daily circuit that dissects what is being suggested by Intel, who is not just talking the death of within-node shared memory, but also of intercore asynchronous message passing. I liked the comparison with sending emails with large video attachements, instead of YouTube links, and then requiring the recipient to clear the inbox before a new email can be received. http://www.dailycircuitry.com/2010/11/intel-talks-kilocore-processors.html
I agree, either Pentium 4 or Itanium should boil 2 pots at the least, maybe 3, not only 1.
I remember installing Wing Commander on a Pentium processor. Normally it ran on a 486. It sped up the game. By about 20 times. You launched and you were half a map away from the combat before you could turn around. When you were pointed at it you held down the trigger and flew threw microsecond long explosions. Then you were half the map away again. You got used to it though.
You'd sort of expect that, with all the processor enhancements since, that Microsoft Office would open faster than in 1995. But you know what, that speed of opening scaled fairly well -- just a few seconds then, a few seconds now. Not sure what happened with Office XP. I'm thinking 1000 cores won't save my Firefox from taking up 500 MB of memory so I'm still out of luck there.
Patience, young whatever-you-are.
No, you're right: Intel should go ahead and start building a one million-core chip now. We need it now to...uh....
"Those who consume the bulk of goods are those who make them. We must never forget this secret of our prosperity."
A dozen+1 cores that can be individually started and stopped to conserve power would be a very interesting system if the memory system could keep up. Having 1000 cores is nearly insane except for some special cases (like GPU functions).
The market will tip if the GPU folk can open up for general purpose functions eliminating a need for faster "main" processors.
I seem to recall a paper (sorry no citation) that stated anything above 75 "cores" runs into issues about performance due to storage serialization and the amount of time it takes to serialize internal instruction buffers. The paper was done by a well respected author and has not (to the best of my knowledge been proven wrong).