Larrabee Based On a Bundle of Old Pentium Chips
arcticstoat writes "Intel's Pat Gelsinger recently revealed that Larrabee's 32 IA cores will in fact be based on Intel's ancient P54C architecture, which was last seen in the original Pentium chips, such as the Pentium 75, in the early 1990s. The chip will feature 32 of these cores, which will each feature a 512-bit wide SIMD (single input, multiple data) vector processing unit."
I doubt it. Maybe they mentioned the Pentium as an example to explain an in-order superscalar architecture as opposed to more modern CPUS.
-There is a lot of overheard in the P54C to execute complex CISC operations that are completely useless for graphic acceleration.
-The P54C was manufactured in a 0.6micron BiCMOS process. Shrinking this to 0.045micron CMOS (more than 100x smaller!) would require a serious redesign up to the RTL level. Circuit design had evolve with process technology.
-a lot more...
Larrabee is going to be Intel's next creation in the GPU world. A many core GPU which has the following peculiarities :
- fully compatible with x86 instruction set. (whereas other GPU use different architecture, and often instruction sets that aren't as much adapted to run general computing).
Thus, the Larrabee could *also* be used as a many core main processor (if popped into a quick path socket) and used to execute a good multicore OS. Something that's not achievable with any current GPU (both ATI's and nVidia's completely lack some control structures - both are unable to use subroutines and everything must be in-lined at compile time)
- unlike most current Intel x86 CPUs, features a shallow pipeline, executing instruction in-order. Hence, the Larrabee (and the Silverthorne which also have such characteristics) are regularly compared with old Pentiums (which also share those characteristics) since the initial announcement and including in TFA.
- feature more cores with narrower SIMD : 32 cores able each to handle 16 32bit float simultaneously. Whereas, for exemple nVidia's CUDA-compatible GPU have up to 16 cores only, but each able to execute 32 threads over 4 cycles and keep up to 768 threads in flight.
This enable Larrabee to cope with slightly more divergent code than traditional GPUs and make it a good candidate to run stuf like GPU accelerated RayTracing.
Hence all the recent technical demos running Quake 4 in raytracing mentionned on /.
That's for what Intel tells you.
Now the old and experienced geek will also notice that Intel has only kept making press releases and technical demo running on plain regular multi-chip multi-core Intel Cores (just promising that the real chip will be even better than the demoed stuff).
Meanwhile, ATI and nVidia are churning new "half"-generations each 6 months.
And the whole Larrabee is starting to sound like a big vaporware.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Right. It clearly isn't using the Pentium design, but a Pentium-like design.
To that, they will have added SMT, because (a) in-order designs adapt to SMT well because they have a lot of pipeline bubbles and (b) there will be a lot of latency in the memory system and SMT helps hide that. I would assume 4 way SMT, but maybe 8. Larrabee will therefore support 128 or 256 hardware threads. nVidia's GT280 supports 768.
The closest chip I can think of right now is Sun's Niagara and Niagara 2 processors, except with a really beefy SIMD unit on each core, and a large number of cores on the die because of 45nm. I think Niagara 3 is going to be a 16 core device with 8 threads/core, can anyone confirm?
Note that this is pretty much what Sony wanted with Cell, but Cell was 2 process shrinks too early. 45nm PowerXCell32 will have 32 SPUs and 2 PPUs (whereas Larrabee looks like it is matching an equivalent of a weak-PPU with each SPU equivalent). It could run at 5GHz too... power/cooling notwithstanding.
According to the diagram in the article, the Larrabee has 8 GDDR memory interfaces, which will supply rather a lot of bandwidth. Presumably, those are GDDR4 or GDDR5 interfaces, so that's 4.5 Gb/s * 8 = 4.5 GB/s bandwidth.
Getting data onto and off the board will still be a challenge - you're limited by PCI Express transfers.