Intel Shows 48-Core x86 Processor
Vigile writes "Intel unveiled a completely new processor design today the company is dubbing the 'Single-chip Cloud Computer' (but was previously codenamed Bangalore). Justin Rattner, the company's CTO, discussed the new product at a press event in Santa Clara and revealed some interesting information about the goals and design of the new CPU. While terascale processing has been discussed for some time, this new CPU is the first to integrate full IA x86 cores rather than simple floating point units. The 48 cores are set 2 to a 'tile' and each tile communicates with others via a 2D mesh network capable of 256 GB/s rather than a large cache structure. "
...or perhaps a megacore?
Why can companies not come up with decent code names. For instance, this would be the perfect case for it being codenamed "Beowulf".
Take it to the limit, everybody to the limit, come on, everybody fhqwhgads.
Intel an American company, with the American economy in the shape it's in, I am offended at the codename Bangalore.
manticore?
Windows 12 will require a minimum 48-core 1.4 THz processor and 8TB of RAM. Microsoft are already planning for this one.
What the heck? Just, what the heck?
They may work fine in towers but drop right out of side mounted desktops and small media units.
This seems like it would be very related to their Larrabee GPU project.
Can someone elaborate on why you'd want 48 full processors, rather than a processor with two (dual) or four (quad) "cores" (I'm presuming core in this case == FPU in the article). Supposedly Win7's SMP support becomes much more effective at the 12-16 core thresehold.
moox. for a new generation.
Why is everything called cloud these days? Yet another du jour buzzword. Is this really justified here?
Now imagine a beowulf cluster of these things...
Only 48 cores? I'd ask them to double that, but reasonably, 64 cores should be enough for anybody.
If you can read this, I forgot to post anonymously.
With 48 processors you can have your system 98% idle running your typical application at full speed rather than just 50% or 75% idle as is the norm now.
Deleted
Imagine a Beowulf Cluster of These !!
because now school administrators only have to install SETI@HOME on 100 48-core computers instead of 5000 standard computers.
Manticore. Mmm. Manticore... Jessica Alba? I'll cast my vote for manticore any day of the week with Jessica Alba in there... [Dark Angel (Comic/Show) references? Yes, I went there... I'll do it again, too!] Personally, though, I think 48 cores in one proc are enough to float my boat... Then, too, so could Ms. Alba... --Stak
Holy happy hippy crap!
So, what operating system are designed to take advantage of that many processors? Also, Is it just me or will this make microsoft per core licence policies really expensive?
Can it run Crysis?
....But can it run Crysis....
This new Cloud processor should create synergies with my SOA Portal system and allow me to deploy Enterprise B2B Push based Web 2.0 technologies!
Maybe if Nvidia partners with Intel and develops a new GPU out of this, it will handle Crysis 2!
Unlike Stanford University, UCSB lacks the money to build a full-blown multiprocessor system. If UCSB had such a system back in the 1990s, then UCSB would likely have produced as much multiprocessor research as Stanford University.
This 48-core processor chip, due to the fact that it will eventually be a commercial product mass-produced by the millions of units, will be economically cheap. This chip will enable UCSB to build or buy a cheap multiprocessor system.
A bunch of graduate students is already salivating at the prospect. They are drooling.
Is there enough cpu to chipset bandwidth to make use of all this cpu power?
More info at:
http://www.sun.com/processors/UltraSPARC-T2/specs.xml
Intel ought to focus. They need to focus more on CPU rather than Larrabee, which is an obvious mistake.
Muchas Gracias, Señor Edward Snowden !
Here's the Wired story.
http://www.wired.com/gadgetlab/2007/08/64-core-chips-a/
Zomg. Twice as fast as Helmer and probably twice as expensive.
There, fixed that for you.
This space for rent. All reasonable inquiries will be entertained at proprietors discretion.
A beowulf cluster of those!
(yes, yes, I'm old, but old memes are sticky)
Maybe Bill Gates will say, "48 cores is more cores than anyone will ever need." Cor blimey.
48 what cores ?
Will a chip with 48x 486 CPUs be of any use today ?
How much L2 cache in each core ? 64Kb ?
1% APY, No fees, Online Bank https://captl1.co/2uIErYq Don't let your $$$ sit in a no-interest acct.
It does sound a lot like it. Truth is that it is probably a lot more like the old Pentium D packages but still kind of interesting.
So how many Coretex A8 cores could you fit on one of these?
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
If you need very little data per core but are executing sick calculations, then yes. But probably not anything realistic...
Here be signatures
hardcore
Ceci n'est pas une
Insightful WTF? If you get offended that easily, you'd better:
Mummy?
Are you my mummy?
Mummm-myyy...
Bow-ties are cool.
Well, the current solutions don't seem bandwidth starved, looking at the dual-channel vs triple-channel Nehalems. With a setup like that you could probably do multiple memory controllers and NUMA too, if you needed so I imagine there'll be enough.
Live today, because you never know what tomorrow brings
for Windows 8!
Give a hand, not a hand-out.
Sounds like a Transformer.
Optimus Prime: What's for dinner?
Mega Core: Electricity
Single-chip Cloud Computer
Wow. That actually caused physical pain in my frontal lobe. Way to live the corporate buzzword stereotype, Intel.
"ARM states that the Cortex-A8 occupies up to 3 mm when fabricated in a 65 nm process." (Source).
Each dual core "tile" is 3mm^2. So only 1 per tile, or 24.
Is if some of the cores are only allowed to perform menial tasks (they were born that way) and the rest of the cores will only do something if you slip them a little cash. Oh, and code with comments doesn't run.
When combined, the 48 cores approach the processing power of one i7...
In the beginning, there was null.
Don't forget to leverage turn-key best-of-breed uhh... consumer-focused... enterprise social matrix uh... what are we selling again?
Well, it could just be that Intel wants to sell a really, really big e-peen to business decision makers ;-)
(You know, the ones with short, pointy hair)
Sun's processors are heavily multi-threaded per core. It is an 8 core CPU where each core can handle 8 threads in hardware. Intel's solution is 48 separate cores, doesn't say how many threads per core.
The difference? Well lots of threads on one core leads to that core being well used. Ideally, you can have it such that all its execution units are always full, it is working to 100% capacity. However it leads to slower execution per thread, since the threads are sharing a core and competing for resources.
Something like Sun's solution would be good for servers, if you have a lot of processes and you want to avoid the context switching penalty you get form going back and forth, but no process really uses all that much power. Web servers with lots of scripts and DB access and such would probably benefit from it quite a lot.
However it wouldn't be so useful for a program that tosses out multiple threads to get more power. Like say you have a 3D rendering engine and it has 4 rendering threads. If all those threads got assigned to one core, well it would run little faster than a single thread running on that core. What you want is each thread on its own core to give you, ideally, a 4x speed increase over a single thread.
So in general, with Intel's chips you see not a lot of thread per core. 1 and 2 are all they've had so far (P4s and Core i7s are 2 threads per core, Core 2s are 1 thread per core). They also have features such as the ability for a single core to boost its clock speed if the others are not being used much, to get more performance for one thread and still stay in the thermal spec. These are generally desktop or workstation oriented features. You aren't necessarily running many different apps that need power, you are running one or maybe two apps that need power.
As for this, well I don't know what they are targeting, or how many threads/core it supports.
How many gigaflops will this sucker do? How is parallel programming done? Is it standard multi-threading, or something else? What's the expected cost of these babies? Bottom line me here.
64 cores ought to be enough for anyone.
I'd be more impressed by a 6GHz quad-core though.
If something like this works like I think it should, there probably wouldn't be a need for a GPU at all, just a rasterizer. Everything would be done on the host CPU, likely with power to spare.
Fifty watts per channel, baby cakes.
Is there enough cpu to chipset bandwidth to make use of all this cpu power?
That's really going to depend on the intended use. And on whether the intended use involves problems that a) can be efficiently parallelized, and more importantly, b) actually have been efficiently parallelized. But unless each core gets its own memory bus and its own dedicated memory with its own cache, I rather expect that the only things that are going to be parallelized to their maximum potential are wait states. All that said, it will still probably run faster than a two- or four-core CPU for many tasks, but it won't be running 48 times faster. I would not, however, refuse a manufacturer's sample if one was handed to me. ;)
On the positive side, if this beast actually makes it to market, it might help spur the development of new parallel software.
Proud member of the Weirdo-American community.
That uses existing IA86 core technology..
Marketing guys are smoking too much 'cloud' i think.
---- Booth was a patriot ----
640 cores aught to be enough for anybody!
I would actually say the opposite: Intel needs to focus *MORE* on something Larrabee like than less. AMD bought ATI for that exact reason, and honestly ATM it seems to be the one division where AMD is staying competitive or ahead of the competition.
In my experience Windows 7 64 bit is noticeably faster with NUMA configuration (Windows experience index is significantly higher because of improved memory throughput) and majority of application also run up to 10 % faster.
I don't know if this is because of Nehalem Xeon CPUs having faster access to CPU local memory in NUMA configuration or if windows is also optimized for this?
As the island of our knowledge grows, so does the shore of our ignorance.
...the crappy x86 architecture.
Oh what people with no spine can achieve... Like no change at all because of fear of not being loved anymore. Or like adding Clippy to your Office suite for the same reason. Or imitating MS Office with your Office suite just to be loved. Or imitating the main OS, for the same reason.
Instead of having the balls to stand behind what you think for a decade: “Oh boy is that a piece of outdated shit, I wish we could replace it by something that actually fits the decade!”
P.S.: No, mentioning a bad architecture, like Itanium, is not going to put a dent into that argument. ^^ Just like acting as if switching to a good architecture and making a clear cut would be mutually exclusive, which they are not. Also even a great concept can fall, if those who should support it, have no spine, and cave in to the retards and uninformed, despite knowing that it’s a great concept.
Any sufficiently advanced intelligence is indistinguishable from stupidity.
I think it's more likely we'll see kibicores and mebicores.
Wake me up when the processor is $100
When all is said and done, nothing changes...
The 48-core chip that Intel demonstrated is 45nm!
Also, Cortex-A9: "For 2000 DMIPS of performance when designed in a TSMC 65 nanometer (nm) generic process the core logic costs less than 1.5 mm^2 of silicon." ( http://www.arm.com/products/CPUs/ARMCortex-A9SingleCore.html ) So it seems "up to 3 mm^2" in your quote really means "up to" (and for a much older core of course, when it was just launching 4 years ago)
And Cortex-A9 "consumes less than 250mW per core"...
One that hath name thou can not otter
It took you some release cycles too long to be original: MegaCore.
If one fails, does that make it a mebinotcore?
At this level of parallelism, it seems to me that the routing/switching and data management between the cores will become far more important than the raw number of cores or how fast each core is. Very similar to cluster computing in that the topology of the cluster (and the interconnect bandwidth & latency) is just as important as the power of each individual node.
Projects like Grand Central Dispatch are a good step in the right direction to making general-purpose computing reasonably multithreaded, but the chip itsself still has to deal with shuffling all that data around. If there's any hope of keeping the pin count within modern package limitations (e.g. socket 1366) then either a solid percentage of the die's real estate will be have to be devoted to interconnect & routing logic, or some serious compromises will have to be made.
BTW, can anyone explain how this architecture is substantially different from Larabee, which is also a whole mess of x86 cores on one die?
Imitation *is* the sincerest form of flattery.
When the designers figure out that simple Multi-core architectures with lots of external/internal bandwidth and fixed partitioning of processes can we finally get on with the move from the cpu centric / one processor mindset.
...doesn't it? Multiple cores, strongly interconnected? What have Intel done that is new here?
"... and more and more now there are all kinds of electronic goodies available" -- Pink Floyd 1972
It's probably rude to point this out, but 2k cores is roughly 43 of these. That's 11 4 socket servers. Less than 1/2 a rack using blades. That's pretty small for a top500 system.
Help stamp out iliturcy.
now all we need is memory density and IO throughput to catch up. for most server/vm deployments memory and IO are your bottlenecks. Sure, this will be useful in niche markets such as scientific research, but a "cloud" processor it is not... without the IO and RAM to back up all those cores very few people will be able to actually make use of them in a single machine.
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
Does this chip have HT? I suspect that Intel, as usual, hide facts about this things. Maybe they "forgot" to tell everyone that the cores was *logical* ones?
48 physical cores+2x HT for each cores would have been nice though, but they would have marketed it like a 96 processor chip for sure then.
ULTRACORE!!!
just hand it over now - we'll find uses for it.
I'm probably the only one, being married and denied sex, but Bangalore reminds me of Humpalot, Ivana Humpalot.
In the late '80s a networked computer chip for multiprocessing was created http://en.wikipedia.org/wiki/Transputer
It was pretty awesome and used C with some libraries or Occam2 as the programming language. You could link up as many of these babies as you would like and they would communicate between themselves for your parallel programs.
It's nice to see something similar in scale coming into the main stream more/again.
Hey, if it works for Gillette, I say go for it
ULTRACORE!!!
GODLIKE!
I would give everything i own for a little bit more.
http://slashdot.org/comments.pl?sid=1435180&threshold=-1&commentsort=0&mode=thread&cid=30021114 [slashdot.org] [slashdot.org]
Per my subject-line: Read that, & get back to us (since you are allegedly a dev mgr. @ MS)... this isn't to "antagonize you", but, rather to help you folks @ MS spot possible problems in Windows VISTA/Windows Server 2008/Windows 7 especially, due to their WFP/NDIS6 firewall design, problems in the local DNS cache client, & in HOSTS files.
Thanks for your time.
APK
P.S.=> I am not sure WHY you've avoided my points, because they are to help "make a better Windows" is all, but I assume because of your being busy. However, your business is making Windows allegedly, so why not take a peek @ something that may point to issues!
(Definite possibles per:
1.) ROOTKIT.COM's findings on unhooking the WFP/NDIS6 firewall easier than the older Windows 2000/XP/Server 2003 setup apparently, WITH CODE THE SAID DOES SO NO LESS in the url pointing to it
2.) Problems in the local DNS Client cache (fails/lags for folks that use "LARGISH" HOSTS files (plenty of us, many 1000's, per Spybot S&D users + folks @ mvps.org (to only name a small few) & even folks like Mr. Oliver Day espouse the use of HOSTS files, finding they make him go faster, AND SAFER, online by far as evidence to it, as well as users who have used a security guide of mine, of which HOSTS are a major part, not seeing any malware intrusions AND GOING FASTER ONLINE TOO)
3.) MS seemingly intentionally removing the ability to use the smaller & faster 0 based blocking IP address in a HOSTS file (when it was MS who put it into Windows, from 2000 in a SERVICE PACK, not its original OEM CD release distro mind you, & leaving it there clear into VISTA, until 12/09/2008 MS patch tuesday, when it (a good thing) was removed for SOME reason (makes no sense, unless somehow the dual IPv4 + IPv6 setup in VISTA onwards facilitates the need for this, & I do NOT think it does @ this point)
AND, more...)
Again, thanks for your time, & I hope this aids MS in "making a better Windows than Windows is", per those points... apk
I'm waiting for Cloud 2.0
I like Hardcore. But that may be because of y pr0n bias. I'm shocked to see that nobody has yet asked us to imagine a Beowulf cluster of Hardc... Ouch, stop hitting me.
--= Isn't it surprising how badly I spell ?
TFA says "Technology: 65nm CMOS Process". Is that a bug in TFA?
What happens if you copy some text with ligatures and paste it into a program that doesn't expect them? What happens if you search for "finally" in a document where "fi" was replaced with a ligature?
Look lower down.
The top diagram and technology is the older 80-core test-chip - but the article doesn't make it immediately obvious.
The core in the 48-core chip is around 6mm^2 (excluding L2 and other uncore) on 45nm, at 567mm^2 total area.
You could fit 4 65nm Cortex A9s in that space. But maybe the x86 core is quad-threaded, like Larrabee's cores.
That's really going to depend on the intended use. And on whether the intended use involves problems that a) can be efficiently parallelized, and more importantly, b) actually have been efficiently parallelized.
There's also the degree to which a task can be effectivly done in parallel (things get interesting when the number of parallel subtasks is greater than the number of execution units, especially where this isn't a multiple of the number of EUs and/or the subtasks are diverse.) which may well vary throughout the task including having to wait until the last of a set of subtasks has finished.
But unless each core gets its own memory bus and its own dedicated memory with its own cache,
Which is likely to give issues with memory coherence, "memory" effectivly becoming additional "cache" With a need for a sophisticated MMU to sort things out.
I rather expect that the only things that are going to be parallelized to their maximum potential are wait states. All that said, it will still probably run faster than a two- or four-core CPU for many tasks, but it won't be running 48 times faster.
Performance never scales linearly to number of CPU/cores even in an idea situation.
Guess I will wait for AMD to make an x64 version of it. I don't care how many cores you have, I am not going back to 32 bit! Expecially if I have 48 cores, I REALLY want to use more than 3 gig of RAM!
If one notes from the articles on the architecture Intel is *still* not biting the bullet of reversible computing [1]. There has to be a fair amount of the architecture built into the frequency and voltage management of the chips (not to discuss chip layer layers involving voltage management (I would like to know whether they are doing all the voltage management on-chip or require new power supplies and/or motherboards (meaning one is unlikely to see plug-in replacements of CPUs on desktop/laptop PCs. One could adopt a conspiracy perspective and argue that this is Intel's attempt to redefine the "standard" computing platform and forcing all "modern" users to purchase new computers! (Wouldn't that sell hundreds of millions or billions of chips???)
1. For those unfamiliar with "reversible computing" it evolved from the work of H. Bremermann, R. Landauer and C. Bennett, largely at IBM in the 1960's and 1970's and pointed out that one could not "destroy" bits without generating heat (Laws of entropy). As a result the only way to do computing without generating wasteful energy consumption (in the form of heat radiated and therefore bumping into the limits of heat dissipation per chip) would be to perform computations reversibly. I.e. you never destroy mass/energy/charge during a computation -- you simply return it to its original state. That is "reversible computatuion". Unfortunately manufacturers like Intel and AMD have not chosen to pursue this aggressively (one would have to believe that there may be some financial motivation behind this). I would tend to view this as pushing existing designs, technologies, instruction sets and limits to their farthest bounds before executing a shift to reversible computing. It may be observed that Eric Drexler, in Nanosystems, Chapter 12, "Nanomechanical Computational Systems" (published in 1992) explained the operation of an atomic scale mechanical gate array that did function as a reversible computational architecture, very much like a "reversible" abacus, because the energy required to reset the calculations was significantly less than that required to erase the matter/energy contained in them.
So the information is out there -- and the question remains when will manufacturers bite the bullet and transform the entire framework into a reversible one? Now in general one doesn't want to accept the delays of reversing the computation when a simple CLR will do.
In order to achieve the highest efficiency, we keep the same instructions loaded on 47 of the processors, and reserve the full bus for the 48th.
Modding me -1 troll doesn't make me wrong.