Slashdot Mirror


Larrabee Based On a Bundle of Old Pentium Chips

arcticstoat writes "Intel's Pat Gelsinger recently revealed that Larrabee's 32 IA cores will in fact be based on Intel's ancient P54C architecture, which was last seen in the original Pentium chips, such as the Pentium 75, in the early 1990s. The chip will feature 32 of these cores, which will each feature a 512-bit wide SIMD (single input, multiple data) vector processing unit."

24 of 286 comments (clear)

  1. Re:What the hell is Larrabee? by Darkness404 · · Score: 5, Informative

    Larrabee is the codename for a discrete graphics processing unit (GPU) chip that Intel is developing as a revolutionary successor to its current line of graphics accelerators. The video card containing Larrabee is expected to compete with the GeForce and Radeon lines of video cards from NVIDIA and AMD/ATI respectively. More than just a graphics chip, Intel is also positioning Larrabee for the GPGPU and high-performance computing markets, where NVIDIA and AMD are currently releasing products (NVIDIA Tesla, AMD FireStream) which threaten to displace Intel's CPUs for some tasks. Intel plans to have engineering samples of Larrabee ready by the end of 2008, with public release in late 2009 or 2010.[1]

    According to Wikipedia http://en.wikipedia.org/wiki/Larrabee_(GPU)

    --
    Taxation is legalized theft, no more, no less.
  2. Re:What the hell is Larrabee? by jandrese · · Score: 3, Informative

    According to TFA, it's a graphics card that Intel is making to compete with Intel and ATI. I'm guessing it's going to be highly optimized for Ray Tracing given Intel's statements in the past. Total power consumption estimates are jaw dropping, TFA estimates around 300W.

    --

    I read the internet for the articles.
  3. SIMD = Single Instruction, Multiple Data by Joce640k · · Score: 4, Informative

    Get your acronyms right....

    --
    No sig today...
  4. Re:I'm no expert but by tlhIngan · · Score: 5, Informative

    The card features one 150W power connector, as well as a 75W connector. Heise deduces that this results in a total power consumption of 300W,

    Um, that just doesn't seem to quite add up to me.

    Power can come from multiple sources. In this case, you have a 150W power connector (probably a 6pin PCIe one), and another 75W one (yet another 6pin PCIe). The remaining 75W comes from the PCIe connector itself.

    Nothing terribly unusual - a number of cards are coming out in configurations like this, and 300W for a video card is starting to become the norm, depressing as it is.

  5. Re:What the hell is Larrabee? by Joce640k · · Score: 4, Informative

    Not quite...

    Larrabee is a general purpose number cruncher with high degree of parallelism.

    NVIDIA/ATI are moving towards making their graphics cards capable of running general purpose code. Intel is coming from the other side, moving a general purpose parallel-compute engine towards doing graphics.

    Yes it's a subtle difference and yes they'll meet in the middle, it's just a question of angles.

    Intel wants the parallel compute market more than it wants the graphics card market so that's who it's pitching this at.

    --
    No sig today...
  6. Re:Marko DeBeeste by sconeu · · Score: 3, Informative

    I can't believe it took this long for someone to find the "Get Smart!" reference.

    Would you believe.... 39 posts?
    How about 20?

    How about one FRIST POST and an In Soviet Russia?

    --
    General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
  7. Re:What the hell is Larrabee? by dpiven · · Score: 1, Informative

    Uh, I think you're talking about LARAMIE, not Larrabee.

  8. Re:Pentium 75? by Anonymous Coward · · Score: 5, Informative

    I don't care if you're a C64 fanboi, Pentiums made mistakes. Apple had nothing do to with it. Read here.

    And this also from the same source... "In June 1994, Intel engineers discovered a flaw in the floating-point math subsection of the Pentium microprocessor. Under certain data dependent conditions, low order bits of the result of floating-point division operations would be incorrect, an error that can quickly compound in floating-point operations to much larger errors in subsequent calculations. Intel corrected the error in a future chip revision, but nonetheless declined to disclose it."

  9. Re:The "Core" chips were based on the Pentium III by TeknoHog · · Score: 3, Informative

    PPro was the first Intel processor that was RISC internally, with translation from x86. Whereas the original Pentium and the P-MMX were pure CISC. This is the main reason I seriously doubt they'd use P54C in Larrabee.

    --
    Escher was the first MC and Giger invented the HR department.
  10. Re:I doubt it by ratboy666 · · Score: 5, Informative

    The original Pentium (which went to 166Mhz, at the end, not just 75Mhz), used U and V execution pipes. No translation to micro-ops, and no "out of order". Indeed, there shouldn't be a need for that in Larrabee, anyway, given the number of cores. It would almost be better to get rid of the V pipe, and add SIMD, instead.

    Your comments on CISC are bit off-base; the idea is to execute shaders in x86 machine code. They can be simple (limited flow control), or complex (general CPU/GPU).

    "out-of-order" (ei. Pentium Pro and better) is not so good with that many cores doing that kind of work. It would get the hardware into a lot of trouble. Better to keep it simple, and add more cores.

    A better start point would probably have been ARM, but that would lose the compatibility edge. If Larrabee works, it will take the GP-GPU market by storm. It needs:

    1 - to publish itself as an NUMA access CPU (add a bit to tell the OS what it is for)
    2 - compiler optimizations for the particular CPU architecture, preferably broken into two pieces:
    2a - "straight line" shader code
    2b - branching code
    3 - a guide to the new NUMA characteristics.

    With that in place, a standard (BSD/LINUX) OS will be able to use it for regular jobs. Or, for those special "I need the SIMD unit" jobs. The biggest hassle is trying to split control of those new CPU units between OpenGL and the regular scheduler (this is a kernel hack that Intel will have to make). It would be easier to jam this into OpenSolaris, but that isn't anywhere near popular enough.

    Don't you want your video card to assist compiling large source when not gaming/modeling? Why not?

    And, a few "extra" points

    - Intel already has an optimizing compiler for the P54C architecture, and we have gcc.
    - The architecture, including U/V pipelines only used 3.1 million transistors.
    - A GeForce 7800 GTX has 302 million transistors -- 100x the number of the original Pentium processor.

    So, I would think that using 32 "Pentium Classic" cores reduced would be quite feasible -- you need some (lots) of logic to ensure that they can all access their respective memories. The general SIMD implementation will take quite a bit of real estate as well. There is probably a budget of 600M transistors (wild ass guess) to Larrabee, estimate derived from power consumption estimates.

    The gate size shrink should result in higher speeds. There may be a danger in the complex instruction interpretation routines, but these can be corrected. The single cycle instructions are already a (more than less) synchronous design, and should scale trivially.

    Anything I am missing?

    I, for one, am looking forward to buying a desktop super-computer with Larrabee.

    --
    Just another "Cubible(sic) Joe" 2 17 3061
  11. Re:What the hell is Larrabee? by ciroknight · · Score: 5, Informative

    Yes, 32 x 600MHz x 1MIP/MHz @ 0.5W == 19.2 GIPS@16W.

    Meanwhile...

    32 x ???MHz (Unknown, but likely to be 900+ to be competitive with current designs) x 3+MIPS/MHZ + 32 x 512-bit SIMD units = OMGWTFHAX @ 300W.

    Seriously. The "Pentium" base of this design is damned near irrelevant. At this point, all it's doing there is scheduling execution on the SIMD units. If you've seen any modern GPU designs, they're basically hugely parallel cores attached to a few "director" cores which puts everything where it needs to go. The original Pentium is probably the most powerful CPU with the least complicated design on the process, with the least amount of legacy MMX cruft.

    --
    "Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush
  12. Re:I'm no expert but by i.of.the.storm · · Score: 2, Informative

    ...and 300W for a video card is starting to become the norm, depressing as it is.

    Not really, die shrinks have been actually driving down power consumption. If you look at this page: http://www.guru3d.com/article/radeon-hd-4850-and--4870-crossfirex-performance/3 you can see that the latest generation Radeon 4850 and 4870 consume much less power than the power hungry peaks set by the 2900XT. The 4850 system uses less than 300W at full load. That's pretty damn impressive considering the ridiculous amount of performance it puts out.

    --
    All your base are belong to Wii.
  13. Re:Uh, isn't that true of the Core CPUs too? by marquis111 · · Score: 2, Informative

    The Intel Core is derived from the P6 architecture, which debuted with the Pentium Pro, not the Pentium. Its history goes: Pentium Pro, Pentium II/Pentium II Celeron/P2 Xeon, Pentium III/Pentium III Celeron/P3 Xeon, skip the Pentium 4 (Netburst architecture), Pentium M, Intel Core. So, this is still interesting news.

  14. Internet telephone game run amok, Slashdot helping by jharel · · Score: 5, Informative
    Hmm... Let's see where they got this from. They claim they got it from a Babelfish translation of Heise, a German site (Yeah, start wincing now...)

    http://babelfish.yahoo.com/translate_url?doit=done&tt=url&intl=1&fr=bf-home&trurl=http%3A%2F%2Fwww.heise.de%2Fct%2F08%2F15%2F022%2F&lp=de_en&btnTrUrl=Translate

    Actually, they got the "Gelsinger said so" remark from Expreview, itself a Chinese site:

    http://en.expreview.com/2008/07/07/larrabee-unleashes-2-tflops-capacity (note they curteously attached the Larrabee board diagram leaked from a while back):

    "Gelsinger said the Larrabee will be a 45nm product featuring SIMD technique, 64-bit address. Besides, 32 of cores runing at 2.00 GHz will unleash 2 TFLOPS capacity, twice as much as the RV770XT."

    But did Gelsinger really SAID those things?

    Here is the Google translation of the same Heise article: http://translate.google.com/translate?u=http%3A%2F%2Fwww.heise.de%2Fct%2F08%2F15%2F022%2F&hl=en&ie=UTF8&sl=de&tl=en

    It seems that no matter which crappily translated version of the German article one looks at, it appears that Gelsinger said no such thing... The part about Larrabee containing P54C cores was clearly in a separate paragraph, written after a speculative question.

    So I guess Expreview THOUGHT Pat said something after it took a too-short of a look at the Heise article, after which CustomPC sensationalized the whole thing, not really bothering to actually read even the translated link it posted. Now, some random Slashdotter is doing the same curtesy.

    There you go, folks- Internet reporting.

  15. Re:I doubt it by georgewilliamherbert · · Score: 2, Informative

    Actually, I used to work at Intel (around the time of 0.6um) and one could, and indeed, did sometimes shrink chips just by "shrinkenating", or perhaps shrinkenating followed by a design rule check. The result was a chip that was cheaper to manufacture, and in most cases, ran faster.

    I know what you were saying, but for the benefit of the general audience:

    That works better if all the geometries scale linearly (line separation, aspect ratios, layer thicknesses, etc). As a general rule, that changes slightly from one generation to another, but there are often significant changes.

    And going from 0.6u to 0.35u to 0.25u to 0.18u to 0.13u to 90 nm to 65 nm to 45 nm is a few too many steps for that assumption to work....

    Particularly given that modern chip photomasks are a completely different phase-shift tech than the older ones. You couldn't size down older masks to new process at all.

    Back to your main point, on why use P54 anyways... My guess is that they really want to kickstart their many-many-core work with this and walked back along their product line until they came to something with enough features, few enough transistors, and modern enough logic model / HDL or Verilog code that they could have a fair chance of translating and resynthesizing it rapidly.

    But that's stretching the available leak knowledge a ways. Someone will eventually go on record with the real details.

  16. Re:Compare with Niagara 2 and 3, and Cell by adisakp · · Score: 2, Informative

    Niagara has direct access to memory AFAIK.

    The big architectural difference with the CELL SPU's is that SPU's really are not meant to directly access system memory. Each SPU has a very limited local memory buffer it can directly access. System memory can be modelled as a RAM DISK and accesses to system memory are through a DMA that can be considered the equivalent to an asynchronous file read/write using the RAM DISK analogy.

  17. Re:What the hell is Larrabee? by trigeek · · Score: 2, Informative

    Where in the article did it say that Larrabee was an integrated solution? Did you not see the picture of the card in the article?

    --
    Sometimes I doubt your committment to SparkleMotion!
  18. Re:Pentium 75? by toddestan · · Score: 5, Informative

    It wasn't every time you divided. It only affected floating point operations, and Intel claims that only 1 in every 8.77 billion random divisions will show the error, and those familiar with the bug agree that Intel's analysis is more or less correct. That would explain how it got through the initial testing by Intel and that the bug wasn't noticed for a while by the general computing public. The whole thing was more of a PR disaster on Intel's part than anything else.

  19. Re:What the hell is Larrabee? by evilviper · · Score: 2, Informative

    I don't know why you'd suspect a Dvorak keyboard. The # sign isn't moved at all, and it's really not close to the apostrophe at all.

    For a Dvorak keyboard, you look for words spelled correctly, but which make no sense in context... Happens a LOT, since all vowels are directly adjacent.

    ie. "It's very hat outside"

    --
    Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  20. Re:Pentium 75? by DurendalMac · · Score: 2, Informative

    They didn't give out free replacements at the beginning. They said you'd have to prove that you needed the proper mathematical accuracy. It was only after a big public stink that they offered a full replacement program.

  21. Re:What the hell is Larrabee? by TheRaven64 · · Score: 2, Informative
    The percentage will be a lot lower if each chip has a 512-bit vector unit - that alone will likely double the core size. The P54C used a very simplistic branch predictor, which makes sense for graphics applications where branches are relatively uncommon and the miss-prediction penalty is much lower (it's insane on modern chips - I got a 25% speedup [on AMD, onlt around 15% on Intel] on some code the other day just by removing a couple of if statements that were almost always taken). Since it's intended as a dedicated graphics chip, they have two choices. They could either expose the micro-ops directly (which is what AMD and nVidia do - your GLSL programs are JIT-compiled to native code when they are run), or they can export a simplified version of x86. There are lots of things, like the string manipulation instructions and some of the complex addressing modes that could be cut and save a lot of space in the decoder. If these generate an illegal instruction exception then the OS can trap and emulate them if they're needed, and Intel just needs to tell compiler writers 'don't use these instructions - they will be 300 times slower than you expect'.

    Another possibility, since this kind of chip is generally running the same program on all of the cores, is to have a single decoder and a shared instruction cache that caches micro-ops.

    --
    I am TheRaven on Soylent News
  22. Re:I doubt it by TheRaven64 · · Score: 3, Informative

    don't forget that the Pentium M and, subsequently, Core line of processors was based on Pentium III Coppermine, whereas the Pentium 4 Netburst architecture developed in the meantime was abandoned completely

    This keeps being repeated, but is simply not true. The Core 2 is a completely new microarchitecture, and so doesn't count in this discussion, while the Core 1 is essentially almost identical to the Pentium M. The Pentium M, however, is not just a tweaked P3 with Netburst completely abandoned. It has a slightly longer pipeline than the P3, and it takes several important features from the Netburst architecture, including (but not limited to) the floating point and vector pipelines and the branch predictor. The Pentium M took the best parts from the P3 and P4 architectures - it didn't just throw one away.

    --
    I am TheRaven on Soylent News
  23. Re:What the hell is Larrabee? by imroy · · Score: 2, Informative

    Around 40% I believe of the original Pentium was x86 translation layer.. it was the first chip to use a RISC-like internal setup.

    No it wasn't. The later Pentium Pro was the first Intel processor to use this method. The Nexgen Nx586 was the first ever (for x86 at least). AMD bought Nexgen and used them to create the K5 (launched slightly after the PPro).

  24. On which scale.... by DrYak · · Score: 3, Informative

    It's mainly a question of "on which scale are we comparing chips".

    Yes, x86 instruction set is utterly ugly and horribly contrived, compared to nice contemporary architectures like 68k. Computing would probably be filled with less hoops had IBM decided to go with Motorolas for their PCs (as lot of other home computers or arcade and home console have done).

    *BUT*

    if we place GPUs on the same scale, suddenly the x86 shines : it doesn't completely suck at branching, and has an actual stack that can be used to call sub procedures, has interrupts, etc.
    It is an architecture able to run an OS.
    nVidia CUDA machine on the other hand, mainly use SIMD-masking for most conditional operation, aren't really brilliant when it comes to branching, and completely lack any way to do sub-procedures. Those chips have loads of register. But instead of using them to do register windows and do RISC-style sub calls, they use the registers to keep more thread in flight.
    It definitely make a lot of sense from a functional point of view (those are GPUs, they are made to processing fuck-loads of pixels per seconds), but this makes them unable to run linux.

    On that scale, having x86 on a GPU suddenly makes it a lot interesting for usages outside the usual "draw triangles very fast". Even if x86 sucks to begin with.

    And for the record : there's hardly a way that the 68k architecture ever prevailed. It's a good one. But IBM was never seing its PC as anything better than a glorified terminal. For such kind of machine, there were of course going for the cheapest possible chip.
    Given a choice between a half assed chip from Intel with a 16bit extension quickly tackled over a design inherited from early 8bit chips (8008, 8080 and concurrent Zx80 - most assembler code can be directly recompiler on 8088 after a few register renaming) AND a very nice chip from Motorola redesigned from the ground up to be a nice and clean 16/32 bits architecture designed for future expension :
    Of course they will pick the Intel. It's cheaper and there's no need for a future proof 32bits processor in a fucking "Terminal Deluxe".

    And of course, because of the (relatively) low cost, because of the (very strong) brand recognition, because of the (somewhat) openness of the platform enabling clones (in the sense it was documented. Of course, Phoenix had to completely rewrite the BIOS because of copyright restrictions - but IBM considered Big Irons being they main products and didn't mind such clones), and because they were takin a relatively uncrowded market (most home computers were for homes, school, and small shops - PC were marketed for corporations) :
    The PC was bound to take over the market very quickly - *with* its bad design (almost *because* of it). And was bound to set the standard, as bad this standard is.
    And by then, it was too late for IBM to take a better architecture to produce a "Terminal Deluxe Pro Mark-III" with a clean 68k chip.

    Of course, had the PC had a less crippled OS, designed to be slightly more extensible and making less assumption about the architecture than MS-DOS (you know the "we laid everything around 1MiB and though it would last for at least 10 years" by mr. Gates), perhaps a switch to a better different architecture could have been less painful, and a cleaner architecture could have blessed the PC world sooner.

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]