Intel Shows 48-Core x86 Processor
Vigile writes "Intel unveiled a completely new processor design today the company is dubbing the 'Single-chip Cloud Computer' (but was previously codenamed Bangalore). Justin Rattner, the company's CTO, discussed the new product at a press event in Santa Clara and revealed some interesting information about the goals and design of the new CPU. While terascale processing has been discussed for some time, this new CPU is the first to integrate full IA x86 cores rather than simple floating point units. The 48 cores are set 2 to a 'tile' and each tile communicates with others via a 2D mesh network capable of 256 GB/s rather than a large cache structure. "
It was called Bangalore to remind you where to call if you need any support for it.
> This post is copyrighted by Robert Nelson for the private use of his audience. Any other use of this post or of any pict
Your sigfile is offensive. What have ye got against the Scots?
Why is everything called cloud these days? Yet another du jour buzzword. Is this really justified here?
Microsoft once had a podcast where they were talking about multi-core CPU kernels. Their belief was that once you had 50+ cores, you would be able to have a mutex for every single COM object element, simply because you could.
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
Only 48 cores? I'd ask them to double that, but reasonably, 64 cores should be enough for anybody.
If you can read this, I forgot to post anonymously.
With 48 processors you can have your system 98% idle running your typical application at full speed rather than just 50% or 75% idle as is the norm now.
Deleted
because now school administrators only have to install SETI@HOME on 100 48-core computers instead of 5000 standard computers.
This new Cloud processor should create synergies with my SOA Portal system and allow me to deploy Enterprise B2B Push based Web 2.0 technologies!
Unlike Stanford University, UCSB lacks the money to build a full-blown multiprocessor system. If UCSB had such a system back in the 1990s, then UCSB would likely have produced as much multiprocessor research as Stanford University.
This 48-core processor chip, due to the fact that it will eventually be a commercial product mass-produced by the millions of units, will be economically cheap. This chip will enable UCSB to build or buy a cheap multiprocessor system.
A bunch of graduate students is already salivating at the prospect. They are drooling.
Is there enough cpu to chipset bandwidth to make use of all this cpu power?
Can someone elaborate on why you'd want 48 full processors, rather than a processor with two (dual) or four (quad) "cores" (I'm presuming core in this case == FPU in the article).
Bad assumption. In this case, we're talking about (what you would consider) a 48 core CPU. Previous designs would have apparently contained only a small number of full processing cores, and a large number of parallel units suitable only for floating point calculations (which can be great for various types of scientific calculations and simulations). This new design contains 48 discrete IA x86 cores.
Seems like the type of processor Grand Central Dispatch was designed for.
Yaz.
That was an offshoot technology. They've finally got all the bugs ironed out and the CPU is much less prone to "uncontrolled exothermic reactions" then it use to be.
This space for rent. All reasonable inquiries will be entertained at proprietors discretion.
That's what each channel is. I forget exactly, but each DDR channel is almost 200+ pins (RDRAM was considered a big win because it is about 80). And pins == money (mainly in die area).
Why can companies not come up with decent code names. For instance, this would be the perfect case for it being codenamed "Beowulf".
They're using geographical names (cities, places, lakes, rivers) to avoid having to register the codename as a trademark. Geographical names can't be trademarked so no one will use your codename for his trademark.
Can someone elaborate on why you'd want 48 full processors, rather than a processor with two (dual) or four (quad) "cores" (I'm presuming core in this case == FPU in the article). Supposedly Win7's SMP support becomes much more effective at the 12-16 core thresehold.
The first thought comes to mind if video processing and CGI animations because those applications are embarrassingly parallel.
And those companies usually have the money to spend on top of the line hardware.
Eventually this will trickle down to consumer level as always and people at home can now do real time movie quality CGI on their home computers in 10 years.
"I am the king of the Romans, and am superior to rules of grammar!"
-Sigismund, Holy Roman Emperor (1368-1437)
A mutex (MUTually EXclusive) is a software methodology in which one thread or process can (usually temporarily) lock a resource (such as a memory location) so that another thread or process may not access it.
It is most often required because resources are normally not 'atomic.' For instance, a string in memory is made up of many machine words and a CPU cannot read or write multiple machine word values in one operation. The danger is that while one CPU is writing to such a non-atomic collection of values, another might be trying to read from (or write to) it.. creating a situation where that second process reads part of the old data and part of the new data (essentially garbage data.)
So the idea of a MUTEX is born, in which an atomic value is leveraged to allow a thread to reserve such resources, signaling others (if they respect the MUTEX as well) to wait their turn.
"His name was James Damore."
I think it's more likely we'll see kibicores and mebicores.
What is worse is that theyve done away with cache coherence. So I dont think you can take a 48 thread mysql / java process and just scale it. You COULD use forked processes that don't share much. (ie postgres/apache/php).
-Michael
The reason the i7 gains nothing going from double to triple channel memory is that the memory controller is power limited and so can only run at reduced clocking on triple channel configurations 800Mhz down from 1333Mhz. Of course for most workloads having 50% more data in RAM instead of glacially slow storage is a win =)
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
A cache line on a modern Intel/AMD processor is actually 512 bits, or 64 bytes.
A memory bus 512 bits wide wouldn't really help much, though -- right now when dealing with memory, most of the time is spent in the various latencies. When you are fetching a lot of memory sequentially, you can get insane speeds even today. But that's not how you usually read memory -- instead, you read a few words from different locations, and the memory controller needs to activate the correct bank, row and column before you get what you need. On typical PC-10600 DDR3, that means at least 15 bus cycles just waiting around for the memory to adjust. Making the bus 512 bits wide would speed up the actual transfer to one bus cycle from the 4 what it takes currently, but that would only mean an improvement of about 15% -- at a huge cost for having to accommodate those 384 extra data lines on the chip, socket, motherboard and ram. It's better just to try to speed up the memory so burst transfers happen "fast enough".
I don't know about nvidia cards, but at least for ati the card doesn't actually have a 256 bit memory interface -- instead, it has 4 completely separate 64-bit memory channels connected to a fast ring bus. The interleaving of data on those separate memory channels is done very coarsely -- basically, entire textures and such are allocated on a single channel. This means that when that texture is being fetched, the 3 other channels can serve other requests.
This is the way I see cpu's evolve too -- even on current hardware, namely phenom 2, you get better performance when you ungang the memory channels, and wait 8 cycles for a single memory transfer instead of 4, because that way you get to wait on separate latencies on the separate channels at the same time. Of course, in the perverse case all the data you want to access resides on one of the channels, but the chance of that happening by accident is pretty much nil.