Larrabee Based On a Bundle of Old Pentium Chips
arcticstoat writes "Intel's Pat Gelsinger recently revealed that Larrabee's 32 IA cores will in fact be based on Intel's ancient P54C architecture, which was last seen in the original Pentium chips, such as the Pentium 75, in the early 1990s. The chip will feature 32 of these cores, which will each feature a 512-bit wide SIMD (single input, multiple data) vector processing unit."
Ah the dreams of the past, a beowulf cluster of old computers come to life :)
A little context might help. This isn't the Inquirer for god's sake.
This is just unbelievably good news. After all this time, I get to start telling Pentium jokes again! I never thought I would!
Aide-toi, le Ciel t'aidera - Jeanne D'Arc.
Get your acronyms right....
No sig today...
The card features one 150W power connector, as well as a 75W connector. Heise deduces that this results in a total power consumption of 300W,
Um, that just doesn't seem to quite add up to me.
Prediction: The real iPhone killer is going to be sex robots from Japan. Think about it.
good. sounds like a sensible engineering decision.
on the basis that..
the design is well known, understood and has had rigorous testing in the field
they will no doubt fix any understood errors firstlimits the RnD to the multicore section
as long as the chip performs well for the silicon overhead then they should feel free to cram as many in as they want.
seems perfectly sensible to me.
I doubt it. Maybe they mentioned the Pentium as an example to explain an in-order superscalar architecture as opposed to more modern CPUS.
-There is a lot of overheard in the P54C to execute complex CISC operations that are completely useless for graphic acceleration.
-The P54C was manufactured in a 0.6micron BiCMOS process. Shrinking this to 0.045micron CMOS (more than 100x smaller!) would require a serious redesign up to the RTL level. Circuit design had evolve with process technology.
-a lot more...
Larrabee is going to be Intel's next creation in the GPU world. A many core GPU which has the following peculiarities :
- fully compatible with x86 instruction set. (whereas other GPU use different architecture, and often instruction sets that aren't as much adapted to run general computing).
Thus, the Larrabee could *also* be used as a many core main processor (if popped into a quick path socket) and used to execute a good multicore OS. Something that's not achievable with any current GPU (both ATI's and nVidia's completely lack some control structures - both are unable to use subroutines and everything must be in-lined at compile time)
- unlike most current Intel x86 CPUs, features a shallow pipeline, executing instruction in-order. Hence, the Larrabee (and the Silverthorne which also have such characteristics) are regularly compared with old Pentiums (which also share those characteristics) since the initial announcement and including in TFA.
- feature more cores with narrower SIMD : 32 cores able each to handle 16 32bit float simultaneously. Whereas, for exemple nVidia's CUDA-compatible GPU have up to 16 cores only, but each able to execute 32 threads over 4 cycles and keep up to 768 threads in flight.
This enable Larrabee to cope with slightly more divergent code than traditional GPUs and make it a good candidate to run stuf like GPU accelerated RayTracing.
Hence all the recent technical demos running Quake 4 in raytracing mentionned on /.
That's for what Intel tells you.
Now the old and experienced geek will also notice that Intel has only kept making press releases and technical demo running on plain regular multi-chip multi-core Intel Cores (just promising that the real chip will be even better than the demoed stuff).
Meanwhile, ATI and nVidia are churning new "half"-generations each 6 months.
And the whole Larrabee is starting to sound like a big vaporware.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Larrabee is the Chief's cousin
Faith: n. -- That human impulse that drives them to steal appliances when the power goes out
Hey, only Intel provide you with a floating point that really floats - why you never know where it's going to end up! Now that's floating!:D
Aide-toi, le Ciel t'aidera - Jeanne D'Arc.
Comment removed based on user account deletion
From TFA "Heise also claims that the cores will feature a 512-bit wide SIMD (single input, multiple data) vector processing unit. The site calculates that 32 such cores at 2GHz could make for a massive total of 2TFLOPS of processing power."
I don't see how they get to 2 TFLops.
512-bit = 64 bit * 8 way SIMD or 32 bit * 16 way SIMD. Let's go with the bigger of these two and say we are performing 16 single Floating point operations per clock-cycle per core. 16 operations per clock-core * 32 cores * 2 Billion clocks per second = 1024 Single Precision GFlops. It looks more like 512 Double Precision GFlops for 300 Watts which means a DP Teraflop on Larabee will cost you 513 Dollars a Year at 10 cents/kWH. If we're considering single precision, we can cut this in half to 257 dollars per years per single precision teraflop.
Compare to Clearspeed which offers 66 DP GFLops at 25 Watts costing 332 dollars for a sustained DP teraflop for a year.
even the NVidia Tesla has better performance at single precision: you can buy 4 SP TFlops consuming only 700W or 5.7 GFLops/Watt, for an annual power budget of 153 dollars.
Obviously they're not just going to slap a bunch of Pentium cores on there and call it good. But the high-level design can probably start off with the P54, and just rip out stuff that doesn't need to be supported, possibly including:
Scalar floating-point, 16-bit protected mode, real mode, operand size overrides, segment registers, the whole v86 mode, the i/o address space, BCD arithmetic, virtual memory, interrupts, #LOCK, etc, etc.
Once you've done that, you'll have a much simpler model to synthesize down to an implementation. And with a slightly-modified compiler spec, you can crank out code for it with existing compilers, like ICC and GCC.
PPro was the first Intel processor that was RISC internally, with translation from x86. Whereas the original Pentium and the P-MMX were pure CISC. This is the main reason I seriously doubt they'd use P54C in Larrabee.
Escher was the first MC and Giger invented the HR department.
One does not "shrink" a chip by taking photomasks and shrinkenating.
'course not. You use a transmogrifier. In the industry, it is known as the "Bill Watterson" process.
It can also be used to turn photomasks into elephants, which, while less profitable, is immensely entertaining if the operator didn't see you change the setting.
dragonhawk@iname.microsoft.com
I do not like Microsoft. Remove them from my email address.
Right. It clearly isn't using the Pentium design, but a Pentium-like design.
To that, they will have added SMT, because (a) in-order designs adapt to SMT well because they have a lot of pipeline bubbles and (b) there will be a lot of latency in the memory system and SMT helps hide that. I would assume 4 way SMT, but maybe 8. Larrabee will therefore support 128 or 256 hardware threads. nVidia's GT280 supports 768.
The closest chip I can think of right now is Sun's Niagara and Niagara 2 processors, except with a really beefy SIMD unit on each core, and a large number of cores on the die because of 45nm. I think Niagara 3 is going to be a 16 core device with 8 threads/core, can anyone confirm?
Note that this is pretty much what Sony wanted with Cell, but Cell was 2 process shrinks too early. 45nm PowerXCell32 will have 32 SPUs and 2 PPUs (whereas Larrabee looks like it is matching an equivalent of a weak-PPU with each SPU equivalent). It could run at 5GHz too... power/cooling notwithstanding.
http://babelfish.yahoo.com/translate_url?doit=done&tt=url&intl=1&fr=bf-home&trurl=http%3A%2F%2Fwww.heise.de%2Fct%2F08%2F15%2F022%2F&lp=de_en&btnTrUrl=Translate
Actually, they got the "Gelsinger said so" remark from Expreview, itself a Chinese site:
http://en.expreview.com/2008/07/07/larrabee-unleashes-2-tflops-capacity (note they curteously attached the Larrabee board diagram leaked from a while back):
"Gelsinger said the Larrabee will be a 45nm product featuring SIMD technique, 64-bit address. Besides, 32 of cores runing at 2.00 GHz will unleash 2 TFLOPS capacity, twice as much as the RV770XT."
But did Gelsinger really SAID those things?
Here is the Google translation of the same Heise article: http://translate.google.com/translate?u=http%3A%2F%2Fwww.heise.de%2Fct%2F08%2F15%2F022%2F&hl=en&ie=UTF8&sl=de&tl=en
It seems that no matter which crappily translated version of the German article one looks at, it appears that Gelsinger said no such thing... The part about Larrabee containing P54C cores was clearly in a separate paragraph, written after a speculative question.
So I guess Expreview THOUGHT Pat said something after it took a too-short of a look at the Heise article, after which CustomPC sensationalized the whole thing, not really bothering to actually read even the translated link it posted. Now, some random Slashdotter is doing the same curtesy.
There you go, folks- Internet reporting.
I bet Duke Nukem Forever is gonna look SWEET on one of these!
According to the diagram in the article, the Larrabee has 8 GDDR memory interfaces, which will supply rather a lot of bandwidth. Presumably, those are GDDR4 or GDDR5 interfaces, so that's 4.5 Gb/s * 8 = 4.5 GB/s bandwidth.
Getting data onto and off the board will still be a challenge - you're limited by PCI Express transfers.
It's mainly a question of "on which scale are we comparing chips".
Yes, x86 instruction set is utterly ugly and horribly contrived, compared to nice contemporary architectures like 68k. Computing would probably be filled with less hoops had IBM decided to go with Motorolas for their PCs (as lot of other home computers or arcade and home console have done).
*BUT*
if we place GPUs on the same scale, suddenly the x86 shines : it doesn't completely suck at branching, and has an actual stack that can be used to call sub procedures, has interrupts, etc.
It is an architecture able to run an OS.
nVidia CUDA machine on the other hand, mainly use SIMD-masking for most conditional operation, aren't really brilliant when it comes to branching, and completely lack any way to do sub-procedures. Those chips have loads of register. But instead of using them to do register windows and do RISC-style sub calls, they use the registers to keep more thread in flight.
It definitely make a lot of sense from a functional point of view (those are GPUs, they are made to processing fuck-loads of pixels per seconds), but this makes them unable to run linux.
On that scale, having x86 on a GPU suddenly makes it a lot interesting for usages outside the usual "draw triangles very fast". Even if x86 sucks to begin with.
And for the record : there's hardly a way that the 68k architecture ever prevailed. It's a good one. But IBM was never seing its PC as anything better than a glorified terminal. For such kind of machine, there were of course going for the cheapest possible chip.
Given a choice between a half assed chip from Intel with a 16bit extension quickly tackled over a design inherited from early 8bit chips (8008, 8080 and concurrent Zx80 - most assembler code can be directly recompiler on 8088 after a few register renaming) AND a very nice chip from Motorola redesigned from the ground up to be a nice and clean 16/32 bits architecture designed for future expension :
Of course they will pick the Intel. It's cheaper and there's no need for a future proof 32bits processor in a fucking "Terminal Deluxe".
And of course, because of the (relatively) low cost, because of the (very strong) brand recognition, because of the (somewhat) openness of the platform enabling clones (in the sense it was documented. Of course, Phoenix had to completely rewrite the BIOS because of copyright restrictions - but IBM considered Big Irons being they main products and didn't mind such clones), and because they were takin a relatively uncrowded market (most home computers were for homes, school, and small shops - PC were marketed for corporations) :
The PC was bound to take over the market very quickly - *with* its bad design (almost *because* of it). And was bound to set the standard, as bad this standard is.
And by then, it was too late for IBM to take a better architecture to produce a "Terminal Deluxe Pro Mark-III" with a clean 68k chip.
Of course, had the PC had a less crippled OS, designed to be slightly more extensible and making less assumption about the architecture than MS-DOS (you know the "we laid everything around 1MiB and though it would last for at least 10 years" by mr. Gates), perhaps a switch to a better different architecture could have been less painful, and a cleaner architecture could have blessed the PC world sooner.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]