Slashdot Mirror


Larrabee ISA Revealed

David Greene writes "Intel has released information on Larrabee's ISA. Far more than an instruction set for graphics, Larrabee's ISA provides x86 users with a vector architecture reminiscent of the top supercomputers of the late 1990s and early 2000s. '... Intel has also been applying additional transistors in a different way — by adding more cores. This approach has the great advantage that, given software that can parallelize across many such cores, performance can scale nearly linearly as more and more cores get packed onto chips in the future. Larrabee takes this approach to its logical conclusion, with lots of power-efficient in-order cores clocked at the power/performance sweet spot. Furthermore, these cores are optimized for running not single-threaded scalar code, but rather multiple threads of streaming vector code, with both the threads and the vector units further extending the benefits of parallelization.' Things are going to get interesting."

196 comments

  1. Missed it by *that* much by Anonymous Coward · · Score: 0, Troll

    If developers are too stupid to code for it, it won't go anywhere. This is sounding a lot like the PS3 architecture in complexity. Parallelism is not that hard to deal with, but you have to know what you're doing. Sadly, few do. Try again in ten years.

    1. Re:Missed it by *that* much by SlashWombat · · Score: 4, Insightful

      It appears that this could well improve the speed of lots of different operations. A definite boon for graphics like operations, but also a lot of DSP (audio/maths)stuff can benefit from these enhancements. It would also appear that general code could easily be sped up, however, compiler writers need to get their collective arses into gear for this to happen.

      However, give the average developer more speed, and all that gets produced is more bloat with less speed. If you watch large teams of programmers, the managment actually force the developers to write slow code, claiming that maintainability is more important than any other factor! (smart code that actually executes quickly is generally too difficult for the dumb-arsed upper level (management) programmers to understand, and is thus removed. Believe me, I've seen this happen many times!)

    2. Re:Missed it by *that* much by SQL+Error · · Score: 1

      Fast code that doesn't work is not all that useful.

      Except in search engines.

    3. Re:Missed it by *that* much by somenickname · · Score: 4, Funny

      It appears that this could well improve the speed of lots of different operations. A definite boon for graphics like operations, but also a lot of DSP (audio/maths)stuff can benefit from these enhancements. It would also appear that general code could easily be sped up, however, compiler writers need to get their collective arses into gear for this to happen.

      Yeah, and while they are at it, I hope they finally get around to fixing that damn segfault bug. It's been around for YEARS.

    4. Re:Missed it by *that* much by julesh · · Score: 4, Informative

      If developers are too stupid to code for it, it won't go anywhere. This is sounding a lot like the PS3 architecture in complexity.

      There are several problems with PS3 programming that don't apply to Larrabee:

      * Non-uniform core architectures. Cell processors have two different instruction architectures depending on which core your code is intended to run on. This causes quite a bit of confusion and makes the tools for development a lot more complex.
      * Non-uniform memory access. Most cell processor cores have local memory, and global memory accesses must be transferred to/from this local memory via DMA. Larrabee cores have direct access to main memory via a shared L2 cache.
      * Memory size constrains. Most cell processor cores only have direct access to 256K of memory, so programs running on them have to be very tightly coded and don't have much spare space for scratch usage.

      Any application that's reasonably parallelisable is going to be pretty easy to optimize for larrabee. Most graphics algorithms fit into this category.

    5. Re:Missed it by *that* much by mdwh2 · · Score: 5, Insightful

      If you watch large teams of programmers, the managment actually force the developers to write slow code, claiming that maintainability is more important than any other factor!

      I don't see why it should be one or the other - maintainability is important, as is using optimal algorithms. Fast algorithms can still be written in a clear and understandable manner.

    6. Re:Missed it by *that* much by Thundersnatch · · Score: 1

      If you watch large teams of programmers, the managment actually force the developers to write slow code, claiming that maintainability is more important than any other factor! (smart code that actually executes quickly is generally too difficult for the dumb-arsed upper (management)

      No, we (management) understand your shifty obfuscated code just fine. It's just that you are too stupid to grasp basic economics.

      One week of a developer's time, fully allocated, costs the same as a decent app server. So optimizing code for maintenance is far more cost effective than optimizing for performance.

    7. Re:Missed it by *that* much by speedtux · · Score: 1

      compiler writers need to get their collective arses into gear for this to happen

      There's a limit to how much general purpose C/C++ code can be sped up automatically; C/C++ semantics just don't allow a lot of optimizations.

    8. Re:Missed it by *that* much by Anonymous Coward · · Score: 0

      Using "optimal" algorithms is often the wrong thing to do if you're going for speed.

    9. Re:Missed it by *that* much by mrfaithful · · Score: 2, Informative

      If you watch large teams of programmers, the managment actually force the developers to write slow code, claiming that maintainability is more important than any other factor!

      I don't see why it should be one or the other - maintainability is important, as is using optimal algorithms. Fast algorithms can still be written in a clear and understandable manner.

      Up to a point, then you've got to make a choice. Keep the high level OOP constructs, or flatten it out to make the compiler's job easier.

      THEN you have the next level of optimization, keep the readable code or do it the "clever" way that nets a 40% boost. And as any experienced coder will tell you, clever code is the antithesis of maintainable.

    10. Re:Missed it by *that* much by rbmyers · · Score: 1

      Your comment is funny, except that it isn't. Coders who understand really very little about the history and how hard really smart people have tried and failed think they are smarter than everyone else, including their managers, who are interested in stupid things like maintainability--even at the expense of the egos of cosmically all-knowing coders.

    11. Re:Missed it by *that* much by mikael · · Score: 2, Interesting

      If you watch large teams of programmers, the management actually force the developers to write slow code, claiming that maintainability is more important than any other factor!

      I've worked in a couple of companies like that - usually the programmers were limited to working on technology that the management (ex-programmers) were familiar with. Then also, management didn't want the programmers learning "high-demand skills" (ie. hardware programming) that would boost the chances of their staff leaving to a better paid environment. Or there was the politics of favoritism where the directors wanted to give a leg up the seniority ladder to their best mate. Everyone else who was qualified "didn't have the skills or was busy on another project" while of course their mate "had applied at just the right time with the right skills". Another problem was that if management gave only one programmer a new hardware system, then everyone else would get cheesed off that they were falling behind that they would leave (eg. a CPU porting project). Alternatively, there are also quota based systems which would piss off one nationality off another.

      Invariably these companies gain a bad reputation and implode after a slow death spiral, where they are forced to lay off staff and sell off equipment to cover debts. With fewer staff, they can't take on new projects, and the cycle continues until the last project is cancelled.

      --
      Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
    12. Re:Missed it by *that* much by Pseudonym · · Score: 1

      First off, there is no such language as "C/C++".

      Secondly, there is one clear advantage of C++ over C here: Vector operations can be exploited by library writers in a seamless way. Someone could, for example, rewrite std::valarray to exploit the new instructions fairly easily, and existing programs which use it will Just Work(tm).

      Having said that, it's more likely that the first libraries to use them will be those based on BLAS.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    13. Re:Missed it by *that* much by x2A · · Score: 1

      "First off, there is no such language as "C/C++"."

      Now, see, that just makes you sound like an idiot, trying to be clever. If you were aiming for humour, swing/miss. Next you gonna tell us there's no such word as 'swing/miss'?
      Jeez.

      --
      The revolution will not be televised... but it will have a page on Wikipedia
    14. Re:Missed it by *that* much by Jesus_666 · · Score: 1

      Until you ned to make a minor change to the code and find out you have to redevelop a module from scratch because the people responsible for it optimized it until nobody but the original developer (who has since left the company) has a chance of understanding the code in a reasonable timeframe.

      Of course this can be avoided through really thorough documentation, but few people are enthusiastic about documenting every step and then documenting the optimizations applied to it.

      --
      USE HOT GRITS WITH STATUE OF NATALIE PORTMAN (NAKED AND PETRIFIED)
    15. Re:Missed it by *that* much by Anonymous Coward · · Score: 0

      Until you ned to make a minor change to the code and find out you have to redevelop a module from scratch because the people responsible for it optimized it until nobody but the original developer (who has since left the company) has a chance of understanding the code in a reasonable timeframe.

      Of course this can be avoided through really thorough documentation, but few people are enthusiastic about documenting every step and then documenting the optimizations applied to it.

      You know what else lets you avoid the problem? Not making the original (genius or not) developer bother with the documentation, without giving him time to document it before giving him yet another project. That's what makes him leave. Just hire an English major to sit behind him with a screen recording app, transcribing his muttered explanations of why this loop spins 4 times instead of 3 before spitting out the memory contents of the next Dword.

      If your business model requires you to churn though developers who will never stick around, don't expect your project to work. The good ones will great ones will leave and the good ones will never come.

    16. Re:Missed it by *that* much by Pseudonym · · Score: 1

      No, I know both languages fairly well, I know they're almost incomparable and am constantly annoyed by people who assert that C++ is just C with a few bits added.

      It's a little-known fact, for example, that C++ has rules which say that the compiler can assume that two pointers aren't aliased under certain circumstances, where C compilers can't make that assumption because of C's weaker type system. Pointer aliasing is one of those issues that can prevent the compiler from generating SIMD or vector instructions.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    17. Re:Missed it by *that* much by x2A · · Score: 1

      "am constantly annoyed by people who assert that C++ is just C with a few bits added"

      No one was asserting such a thing here. The poster was talking about "general purpose C/C++ code" which is as perfectly valid as, for example, someone talking about website scripting in perl/php, also two completely different languages, but with overlap of what they can be used for, and it's that overlap that's referenced as the subject. However different they can be, C/C++ have undeniable massive overlap. Grouping and generalising are perfectly valid legal communicating constructs completely necessary in the conveying of ideas. Complaining that differences between elements within a generalisation are omitted, when the differences fall outside the context of what is being discussed, is basically just waving a big "missed the point" flag, which happens... but it doesn't have to be accompanied by abrasion.

      --
      The revolution will not be televised... but it will have a page on Wikipedia
    18. Re:Missed it by *that* much by chrysrobyn · · Score: 1

      You always have to pick your priorities. Choosing maintainability is good for bug hunting and repair. Beyond that, you can choose speed, disk space, memory footprint, or any number of other factors. Where I work, maintainability is very important, so we use Perl. A whole lot of what I do would be an order of magnitude faster if it was done in C, but not everybody here knows C (and we're not a software shop, so that's not a job requirement). Heck, even in Perl, there are dozens of ways to do things, and there are even different ways to write the same exact algorithm with the same readability but one is faster than the other due to how it's interpreted.

      There are situations where maintainability is less important than other priorities, but it's clear that slow, easy to read (and debug and fix) has its place in the market.

  2. An architecture named after a Get Smart character? by Eternal+Vigilance · · Score: 1

    Bet they've got some serious CONTROL structures to keep things from getting too KAOTIC....

    "Would you believe a GOTO statement and a couple of flags?"

  3. Had a flashback there. by palegray.net · · Score: 5, Funny

    The story title conjured up images of the boxes of ISA cards I've still got sitting around. Ah, the joys of setting IRQs... good times.

    1. Re:Had a flashback there. by 4181 · · Score: 2, Informative

      It's probably worth noting that although the actual article uses neither the acronym nor its expansion, ISA in the story title refers to Instruction Set Architecture. (My first thoughts were of ISA cards as well.)

    2. Re:Had a flashback there. by palegray.net · · Score: 2, Funny

      Yeah, I just got weird mental images of ISA cards jutting out of modern-day motherboards. It was disturbing.

    3. Re:Had a flashback there. by Anonymous Coward · · Score: 1, Funny

      So if this turns out to be good, will everyone jump on to the ISA bus?

    4. Re:Had a flashback there. by Anonymous Coward · · Score: 0

      I still remember the good old Sound Blaster 16

      Port 220
      IRQ 5
      DMA 1
      DMA(hi) 5

    5. Re:Had a flashback there. by fuzzyfuzzyfungus · · Score: 1

      Didn't you hear? As of PCIE 3.1, compliant PCIE controllers will support detection, pin remapping, and an entire emulated 8086, allowing valuable legacy 8-bit ISA cards to remain in use!

  4. Re:An architecture named after a Get Smart charact by MichaelSmith · · Score: 1

    Bet they've got some serious CONTROL structures to keep things from getting too KAOTIC.... "Would you believe a GOTO statement and a couple of flags?"

    How about a while loop and a continue statement?

  5. Duh by symbolset · · Score: 4, Interesting

    That's what libraries, toolsets and custom compilers are for. If the problem was just silicon we'd have Larrabee by now. What's holding up the train is the software toolchain and software licensing issues.

    Don't worry, though. On launch day the tools will be mature enough to use, and game vendors will have new ray tracing games that look fabulous on nothing but this.

    I'm hoping the tools will be open but that's a long bet. If they are, Microsoft is done as the game platform for the serious gamer and Intel will make billions as they take the entire graphics market. Intel will make hundreds of millions regardless and a bird in the hand is worth two in the bush, so they might partner in a way that limits their upside to limit their downside risk. That would be the safe play. We'll see if they still have the appetite for risk that used to be their signature. I'm hoping they still dare enough to reach for the brass ring.

    --
    Help stamp out iliturcy.
    1. Re:Duh by ThePhilips · · Score: 1

      Tools would mean dick and it would take long time for developers to actually adopt such tools to exploit the architecture.

      As an example, take recent Sun's highly-multi-threaded CPUs - T1 and T2. Benchmark team ran our software (essentially message processing) on the CPUs and it is dog slow - unless you disable the CPU multi-threading. On other hand Java, have seen good boost to performance: because Sun's JIT already can on-the-fly optimize code for such architecture.

      It is a long road for Larrabee before general acceptance. Let's just that Intel wouldn't make same mistake twice.

      --
      All hope abandon ye who enter here.
    2. Re:Duh by fuzzyfuzzyfungus · · Score: 2, Insightful

      We should remember that, at least at first, Larrabee is being pushed as a GPU, not a CPU, and all contemporary GPUs are already substantially parallel. If anything, intel's is (unsurprisingly) by far the most conventionally CPU-like of the bunch.

    3. Re:Duh by davolfman · · Score: 2, Interesting

      Raytracers are surprisingly easy to code. There ought to at least be a demo app when the time comes.

    4. Re:Duh by AceJohnny · · Score: 1

      Don't worry, though. On launch day the tools will be mature enough to use, and game vendors will have new ray tracing games that look fabulous on nothing but this.

      Huh, that thinking sunk another of Intel's efforts, the Itanium. It was an architecture that required explicitly paralleled code (by the compiler). After sending the first samples to labs and universities and anybody interested in making a compiler for it, they thought everything would be good.

      Except the awesome compilers didn't materialize. Itanic sunk.

      Do. Not. Expect. Compilers to solve your problems for you. If we could, it would be done by now, and we could make awesome optimizing compilers that would target x86, PowerPC, CUDA, whatever, from that snippet of maintainable high-level code you wrote.

      It doesn't exist. Don't wait for it. Toolchains suck and will always suck (relative to what you're expecting).

      --
      Misleading titles? Inflammatory blurbs? Keep in mind that Slashdot is a tabloid.
    5. Re:Duh by gfody · · Score: 1
      --

      bite my glorious golden ass.
    6. Re:Duh by setagllib · · Score: 1

      Well that's not quite true. It depends entirely on *how* you express your program logic. You can't make a C compiler that makes use of a GPU, but you can make use of a mathematical kernel compiler which can use x86 and a GPU to almost their fullest performance.

      That's where the trend is going with OpenCL. OpenCL will let you run mathematical code reasonably well on a CPU (maybe not as well as equivalent C, but that gap should close), and without modification, run it stupidly well on a GPU. So you'd just link it with your Python program and benefit from whatever hardware/software combination you have available, (hopefully) without having to modify the Python or the math kernel.

      --
      Sam ty sig.
  6. WTF. I do not want moar x86. by MostAwesomeDude · · Score: 0

    Seriously, most of the Mesa shader assemblers deal with very limited, simple, straightforward shader ISAs. This is icky. We're gonna need a full-on compiler for this.

    --
    ~ C.
    1. Re:WTF. I do not want moar x86. by joib · · Score: 3, Interesting

      Isn't this exactly what Gallium3d + LLVM GLSL compiler is giving you? Heck, even with the simple shader ISA's you probably want an optimizing compiler anyway in order to get good GLSL performance, no?

      Wouldn't this actually be a good thing; instead of spending all the time developing new drivers for each generation of hw (changing every 6 months, poorly if at all documented), you could just keep on developing the architecture and improve the x86 backend.

    2. Re:WTF. I do not want moar x86. by palegray.net · · Score: 0

      We need compilers that are far better at taking advantage of parallel architectures anyhow.

    3. Re:WTF. I do not want moar x86. by Cyberax · · Score: 1

      Will LLVM help with this? AFAIR, Gallium3D already supports rendering using Cell PPUs and Larrabee is going to look like them.

      PS: thanks for your work on Gallium3D!

    4. Re:WTF. I do not want moar x86. by julesh · · Score: 1

      Seriously, most of the Mesa shader assemblers deal with very limited, simple, straightforward shader ISAs. This is icky. We're gonna need a full-on compiler for this

      If you don't need the extra complexity of an x86 core, you can ignore it. Compilers for this system will be just as simple as compilers for current nvidia/ati designs.

    5. Re:WTF. I do not want moar x86. by makomk · · Score: 1

      I'm not convinced. It seems like Intel have just bolted a decent ISA for graphics work onto the side of x86. In typical graphics applications, I'm guessing the new graphics instructions will do most of the work - particularly for shaders. They seem fairly decent and sane, quite similar to other modern designs (though probably more flexible).

      Now if you want real fun, try getting good performance out of r600 and up Radeon cards. Nasty VLIW architecture with all sorts of strange and interesting restrictions.

    6. Re:WTF. I do not want moar x86. by Anonymous Coward · · Score: 0

      Gallium3d interface/driver is only on Linux from what I understand so it's pretty useless for game developers that are trying to sell games.

  7. End of an era by pkv74 · · Score: 2, Interesting

    This 300 watts monter, 8086/386/586/x86-64/mmx+sse+ss2+ss3+whateversse compatible mess represents (or should represent) the end of an era. Few people is asking for that kind of product; price and size is more important. It's just Intel trying to hold the market captive forever.

    1. Re:End of an era by Repossessed · · Score: 2, Informative

      Intel actually tried to build a different leaner, instruction set. IA64, the market rejected it.

      Via and AMD don't have much trouble implementing these instruction sets either, or adding their own, so this doesn't much represent a stranglehold move on Intel's part.

      If you really want cheap small processors with no extra instruction sets, Intel does still make Celerons, I dare you to run Vista on one.

      --
      Liberte, Egalite, Fraternite (TM)
    2. Re:End of an era by Hal_Porter · · Score: 3, Informative

      Actually the key patents on x86 probably run out soon. x64 has always been licensable from AMD. And an AMD or Intel x86/x64 chip has been at the top of the SpecInt benchmark for most of the last few years. Plus Itanium killed of most of the Risc architectures and x64 looks likely to kill off or nicheify Itanium.

      Meanwhile NVidia are rumoured to be working on a Larrabee like chip of their own. Via have a ten year patent license, by which point the architecture is rather open. And Larrabee shows a chip with a lot of simple x86 cores is good enough at graphics for most people to not need a powerful GPU. I bet a Larrabee like CPU would be great in a server too, and it's trivially highly scalable by changing the number of cores.

      I'd say x86/x64 will be around for a long time.

      --
      echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
    3. Re:End of an era by Anonymous Coward · · Score: 5, Insightful

      IA64 was rejected because it was too lean. It's actually a horrendously complicated ISA which requires the compiler to do a lot of the work for it, but it turns out that compilers aren't very good at the sort of stuff the ISA requires (instruction reordering, branch prediction etc.) It also turned out that EPIC CPUs are very complex and power-hunger things, and IA32/x86-64 had easily caught up with and surpassed many of the so-called advantages that Intel had touted for IA64.

      The only reason Itanium is still hanging around like a bad smell is because companies like HP were dumb enough to dump their own perfectly good RISC CPUs on a flimsy promise from Intel, and now they have no choice.

    4. Re:End of an era by ThePhilips · · Score: 1, Interesting

      Plus Itanium killed of most of the Risc architectures and x64 looks likely to kill off or nicheify Itanium.

      This is misinformed B.S. Itanium didn't kill anything.

      That was (and is) triumphant march of Linux/x64 all the time.

      It is true that Intel and HP made out of PA-RISC and Alpha sacrificial lambs on Itanic's altar. Yet, Itanic's never caught up (and never will) to the levels where both PA-RISC and Alpha in the times were.

      I bet a Larrabee like CPU would be great in a server too, and it's trivially highly scalable by changing the number of cores.

      Servers are I/O heavy - CPU parallelism is very secondary. I doubt Larrabee would make any dent in server market. Unless of course OnLive/similar would catch up or Intel add something interesting for e.g. XML processing.

      --
      All hope abandon ye who enter here.
    5. Re:End of an era by Hurricane78 · · Score: 5, Funny

      So that is where the term "EPIC FAIL" comes from...

      --
      Any sufficiently advanced intelligence is indistinguishable from stupidity.
    6. Re:End of an era by SpazmodeusG · · Score: 4, Informative

      Look i hate to be anal, but neither Intel nor AMD have been at the top of the SpecInt benchmark for a long time.
      The stock IBM Power6 5.0Ghz CPU is the fastest CPU on the specint benchmark on a per-core basis (and before that it was the 4.7Ghz model of the same CPU that was the leader).

      http://www.spec.org/cpu2006/results/res2008q2/cpu2006-20080407-04057.html
      Search for: IBM Power 595 (5.0 GHz, 1 core)
      Which is telling considering it's made on a larger process than the fastest x86 (the i7). It really shows there's room for improvement if you ditch the x86 instruction set.

    7. Re:End of an era by turgid · · Score: 3, Informative

      Intel actually tried to build a different leaner, instruction set. IA64, the market rejected it.

      It wasn't lean at all. It it typical over-complicated intel junk. Just look at the implementations: itanic. It's big, hot, expensive, slow...

      If you really want cheap small processors with no extra instruction sets, Intel does still make Celerons, I dare you to run Vista on one.

      The Celerons have all the same instructions as the equivalent "core" processors, they just have less cache usually.

      This Larabee thing doesn't sound much different to what AMD (ATi) and nVidia already have. A friend of mine has done some CUDA programming and, form what he says, it sounds just the same. Just like a vector supercomputer from 10 years ago.

    8. Re:End of an era by Anonymous Coward · · Score: 1

      The difference is in threading: traditional GPUs are SIMD machines, which means that they handle lots of data at the same time but only perform a single operation on the entire data range. Recent incarnations support MIMD operations and conditional execution, but they still have single-threaded, single-context execution units.

      If Intel really manages to build a multithreaded architecture with the same processing power as current (GP)GPUs, they might just succeed with their approach.

      But to be honest, I'd much rather see any other company win this market segment, if only for the sake of competition. And of course, I want the x86 ISA to die. Having a 64-bit CPU still boot in 16-bit compatibility mode is just madness.

    9. Re:End of an era by julesh · · Score: 1

      Servers are I/O heavy - CPU parallelism is very secondary

      I take it you've never tried to run a large-scale J2EE app.

    10. Re:End of an era by Rockoon · · Score: 1, Insightful

      Some people buy 300watt video cards..

      ..and some of them dont even do it for gaming, but instead for GPGPU

      This is a real market, and as it matures the average joe will find that it offers things that they want as well.

      The fact is that as long as even a small market exists, that market can expand under its own momentum to fill roles that cannot be anticipated.

      I certainly wasn't thinking that there was a market for hardware accelerated graphics 20 years ago, yet I'm sure to make sure thats in the system I build today.

      I certainly wasn't thinking about multi-core computers 20 years ago, yet I wouldn't buy anything less than a quad core today.

      I certainly wasn't thinking about going from 20-bit to 32 -bit to 64-bit addressing 20 years ago.. I was happy with 640K and some bios above that, yet today if I build a system its going to have at least 8 gigs of memory.

      I couldnt even dream of filling the 40meg hard drive I got with my first 386, yet now I am wondering if I should clean up my 500GB drive or simply buy a new 1TB drive to slot right next to it.

      Yeah.. people weren't asking for those kinds of products either.. now we want them because those pesky unpredictable uses that come up that are actualy attractive to us.

      --
      "His name was James Damore."
    11. Re:End of an era by hairyfeet · · Score: 1

      If this thing really does need 300 watt just to power the card i predict it will go down in flames. The cards by Nvidia and AMD have made major strides on heat/power and Intel wants us to go back to 300 watt GPUs? WTF? I for one don't want or need another Intel space heater, thank you very much.

      Not to mention this thing looks different enough from ATI/Nvidia cards that if it gains a foothold we could end up just like the bad old days of Win9x. In case any youngsters don't remember there was this little thing called Glide which was a proprietary 3DFX tech that would only work on their cards, so you ended up with games that looked like shit if you didn't have a Voodoo card or games that supported the other cards but ran like shit on Voodoo. I personally like the fact that I can buy either Nvidia or ATI and have games both past and present work and have comparable graphics. This thing looks to be different enough that I'm betting code not written specifically for it will probably take a serious performance penalty. No thanks.

      While I had hopes when it was first announced that Larrabee would be simply a new low power GPU by Intel that wouldn't suck, instead it looks to be another Netburst space heater. Meanwhile with the Ion and the 780 Nvidia and AMD have made great strides in the IGP space that currently is ruled by Intel. Between this and the way Intel seems to be burning their bridges with lawsuits from Nvidia and AMD one really has to wonder just what the hell they are thinking. Intel with their crazy maneuvers has made up my mind and for the first time since the old K2 I'm building an AMD box next month to replace my aging P4. Because with all this craziness who knows if an Intel box would support the GPUs that I want to run? Because just from what I have seen and read so far Larrabee simply isn't the GPU for me. So sorry Intel, but the insanity you have been pulling for the past few months have cost you this life long Intel man. Stick to the CPU and let companies with much more experience with graphics stick to the GPU.

      --
      ACs don't waste your time replying, your posts are never seen by me.
    12. Re:End of an era by Hal_Porter · · Score: 1

      Here are the SpecInt scores

      http://www.onscale.de/specbrowser/2006-i.html

      The top score is an Intel Xeon X5570 at 2933Mhz
      SPECint2006 = 36.3
      SPECint_base2006 = 32.2

      Look at SpecFP

      http://www.onscale.de/specbrowser/2006-f.html

      The same chip is on top there too

      SPECfp2006 = 42.0
      SPECfp_base2006 = 39.3

      Here are the results you linked to for a 5Ghz PPC were

      SPECfp2006 = 24.9
      SPECfp_base2006 = 20.1

      So even at SpecFP where Risc has traditionally been quite a bit ahead, x86 is now on top. On SpecInt it's been like that for ages, at least since Athlon64.

      --
      echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
    13. Re:End of an era by SpazmodeusG · · Score: 3, Informative

      I said on a per-core basis!
      The Xeon X5570 is a quad core machine!

    14. Re:End of an era by ThePhilips · · Score: 1

      With same success, I can run "for(;;;);" in several threads and run all CPUs/cores to ground.

      No matter what Java folks try to make out of it, Java on servers is pretty niche - precisely because of inefficient use of resources.

      Server Java is of course not so niche in whole Java market. But not other way around.

      --
      All hope abandon ye who enter here.
    15. Re:End of an era by sznupi · · Score: 2, Interesting

      And what about per-Watt basis? (honest question here; though I do suspect i7 is quite a bit more competive here)

      --
      One that hath name thou can not otter
    16. Re:End of an era by godefroi · · Score: 2, Informative

      Here's a little secret:

      Lots of games (maybe all of them) already include graphics-vendor-specific rendering engines. It's just that nowadays your graphics API isn't your whole game development toolset (glide), so it's easy to include support for both (all) vendors.

      --
      Karma: Poor (Mostly affected by lame karma-joke sigs)
    17. Re:End of an era by Anonymous Coward · · Score: 0

      It doesn't matter unless you're looking at the SPEC rate scores, because only one core is used.

    18. Re:End of an era by ebuck · · Score: 1

      It's not that HP is dumb, it's greedy. HP owns something on the order of 50% of the IP that goes into an Itanium. If they can effectively block you from buying anything else, you buy into their patents. Intel is the other major patent holder.

      Most of the patents for the Itanium are designed to make it impossible to produce an Itanium clone without violation the patents.

    19. Re:End of an era by binarylarry · · Score: 1

      "Java on servers is pretty niche"

      ROFL

      --
      Mod me down, my New Earth Global Warmingist friends!
    20. Re:End of an era by the_humeister · · Score: 1

      Because there weren't any low-power 64-bit PowerPC processors at the time and Apple couldn't wait.

    21. Re:End of an era by the_humeister · · Score: 1

      What I find interesting is that Intel tried this thing before, but it was called the iAPX 432 back in the '80s. It failed miserably back then, but is only somewhat more successful now.

      Also, I think it was HP that approached Intel to make Itanium, not the other way around.

    22. Re:End of an era by forkazoo · · Score: 2, Informative

      This is misinformed B.S. Itanium didn't kill anything.

      That was (and is) triumphant march of Linux/x64 all the time.

      Itanium killed high end MIPS years before anybody was talking about x64. You mentioned PA RISC, and Alpha was dead in-practice long before HP ever had it to officially declare it dead. Itanium killed a lot of good architectures.

    23. Re:End of an era by ZosX · · Score: 1

      oh yeah...i did forget about the problems they had trying to get the g5 in laptop form while intel had the cool and efficient cores in the wings. thanks for reminding me. it doesn't make me any sadder to see the power go.

    24. Re:End of an era by Anonymous Coward · · Score: 0

      Actually there was one company who built PPC64 processors with very good performance/watt: PASemi, that Apple bought about one year ago. On a 65nm process, they still trounce any Intel offering in performance/watt. It was a dual core, with dual DDR2 memory controller and 24 PCIe lanes (splittable almost at will, like 1x16+1x8, or 1x16+2x4). Some of the PCIe lanes could be configured as GigE or even 10GbE (up to 2, although a single 10GbE would eat up 4 lanes).
      It had other niceties, like an encryption unit I believe and some TCP offload.

      A very nice chip, whose maximum power consumption was 25W at 2GHz will all the units full steam. Typical was closer to 15W; the 25W value is really for the power supply designers (if you look at a Core i7 spec, the 130W TDP actually translates into 200W for the power supply designer, needing 145A of core supply current).

      Really, Apple could have built nice laptops with that chip (not with the 8641D which needs more power, although it was competitive in terms of performance with the Core Duo, the G4 being completely hampered by their front bus frequency, not by the core itself).

      P.S: typing this on my 8 year old Pismo...

    25. Re:End of an era by LWATCDR · · Score: 1

      That will depend on the server. Encryption could benifit from a Larrabee like System as could things like software RAIDs. With extra cpu power available software RAIDs and advanced file system like ZFS could replace hardware RAIDs everywhere.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    26. Re:End of an era by i.of.the.storm · · Score: 1

      Uh, I would not call IA64 leaner; VLIW is a huge mess and forces the compiler to do a lot of optimization, and if it can't do the optimization then performance sucks. Of course the market rejected it.

      --
      All your base are belong to Wii.
    27. Re:End of an era by Macman408 · · Score: 1

      On the other hand, the high-end product lines make up a significant portion of the profits for chip companies. They charge a huge premium for perfect chips that run at high speeds and low (or relatively low) power and temperature. The chips that aren't as perfect get sold for much less to the masses, running at lower clocks, lower voltages, and even with features disabled (as in the case where a chip has a defect).

      Because of this, chipmakers will probably continue to have at least one product for the high-end market, and the benefits will continue to trickle down to the rest of us.

    28. Re:End of an era by ozbird · · Score: 1

      It also turned out that EPIC CPUs are very complex and power-hunger things, and AMD64 had easily caught up with and surpassed many of the so-called advantages that Intel had touted for IA64.

      Fixed that for you.

    29. Re:End of an era by Bigjeff5 · · Score: 0

      I said on a per-core basis!
      The Xeon X5570 is a quad core machine!

      That's like running a race between a unicycle and a quadcycle, and claiming the unicycle is obviously the superior transportation because it has more hp per-wheel.

      Your caveat is idiotic. If the Power6 can only have 1 core in a chip, it should be judged on a per-chip basis, not a per-core basis as the chip is the lowest level it can be sub-divided into.

      Now, if you've got a dual or quad core Power6 (quad would be apples to apples) then a per-core comparison is reasonable.

      Quit using caveats to show the lesser competitor is superior. It's dumb and dishonest.

      --
      Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller
    30. Re:End of an era by cizoozic · · Score: 1

      So that is where the term "EPIC FAIL" comes from...

      Indeed, I've heard it used with Itanic more than a few times.

    31. Re:End of an era by Anonymous Coward · · Score: 0

      When pkv74 said "leaner", he said not only "watts", but "price and size".

      At release, the cheapest IA64 processor was over $1000, going to over $4000 for a high-end one. Less than a year later, the exact same processors (speed and cache) had *increased* in price by thousands of dollars. (I don't know what x86 CPUs were going for in 2001, but I do remember we were buying complete Dell systems for a little over $1000. They weren't very good machines, but they did what we needed, and they were pretty cheap.)

      I don't know much about Itanium, but I do know that tripling the price of a CPU model over the course of a year is not the way to compete on price.

      Another way to look at it: buying Itaniums (and then not using them) in May 2001 would have been an *investment*. If the Underpants Gnomes had switched to stealing Itaniums, step 2 would have been "let them age for a year". Can you think of any other CPU in history for which this was true? :-)

    32. Re:End of an era by Anonymous Coward · · Score: 0

      Software RAIDs need CPU power? Since when? They just need insane IO bandwidth. One 2 GHz core can XOR 32 gigabytes per second in cache... or 3 gigabytes per second from memory.

      You're right about encryption, though. One core can do just 50 MB/s AES128 or 50-100 RSA 2048 signs per second. In actual applications (OpenSSL) much even less.

    33. Re:End of an era by Ant+P. · · Score: 1

      I personally like the fact that I can buy either Nvidia or ATI and have games both past and present work and have comparable graphics. This thing looks to be different enough that I'm betting code not written specifically for it will probably take a serious performance penalty.

      I personally like the fact that I can go out and buy anything with Intel graphics on it and not have to worry about whether a driver exists, whether it'll run KDE 4 at 2 frames per minute, whether I'll have to wait half a year before they release a driver for the current version of X, etc. ATi are trying to match that level of support, but nVidia? Fuck nVidia.

    34. Re:End of an era by SpazmodeusG · · Score: 1

      Well if you really want to play that game here's the power6 64-core benchmark.
      http://www.spec.org/cpu2006/results/res2008q2/cpu2006-20080407-04058.html

      1820 SpecInt marks. Well over 20 times the score of the x5570 in every category.

      Of course the number of cores can be whatever the box maker wants it to be. So it's a stupid comparison. The only fair comparison is a per-core comparison.

    35. Re:End of an era by Anonymous Coward · · Score: 0

      Not just integer but also floating point; the linked chart shows older Power6 4.7Ghz speed chips schooling intel's nehelem chips in floating point performance:

      http://www.onscale.de/specbrowser/2006-fr-004.html/

    36. Re:End of an era by Anonymous Coward · · Score: 0

      Quad core Power6 systems from 2007 beating quad core i7 Extreme Edition systems from 2009 in floating point performance:

      http://www.onscale.de/specbrowser/2006-fr-004.html/

    37. Re:End of an era by Hal_Porter · · Score: 1

      http://www.spec.org/cpu2006/Docs/readme1st.html#Q15

      There are several different ways to measure computer performance. One way is to measure how fast the computer completes a single task; this is a speed measure. Another way is to measure how many tasks a computer can accomplish in a certain amount of time; this is called a throughput, capacity or rate measure.
      The SPEC speed metrics (e.g., SPECint2006) are used for comparing the ability of a computer to complete single tasks.
      The SPEC rate metrics (e.g., SPECint_rate2006) measure the throughput or rate of a machine carrying out a number of tasks.

      For the rate metrics, multiple copies of the benchmarks are run simultaneously. Typically, the number of copies is the same as the number of CPUs on the machine, but this is not a requirement. For example, it would be perfectly acceptable to run 63 copies of the benchmarks on a 64-CPU machine (thereby leaving one CPU free to handle system overhead).

      SpecInt2006 and SpecFP2006 are both single core benchmarks, it is only SpecRate which is multicore. In which case this shows a 3Ghz Nehalem has more performance per core than a 5Ghz PPC. That's not too unexpected actually, Nehalem is out of order and POWER6 isn't.

      Looking at int rate for 4 cores, which really is a multicore benchmark,

      http://www.onscale.de/specbrowser/2006-ir-004.html

      it seems like Nehalem is still ahead, but not by as much. Same with int rate for 8 CPUs

      http://www.onscale.de/specbrowser/2006-ir-008.html

      Now for FP rate the picture is different

      http://www.onscale.de/specbrowser/2006-fr-004.html

      Power6 does indeed come out ahead. And for the rate scores with more than 16 cores, pretty much all the scores are for Risc because no one bothers to make x86 machines with that many cores. Mind you Larrabee might change that. Larrabee is in order too, has loads of cores and a wide vector FP unit. If all the rhetoric about being vector complete is true, it seems like it would score very well on Spec FP rate for lots of cores, something x86 is still weak at.

      --
      echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
    38. Re:End of an era by hairyfeet · · Score: 1

      You know, you have made me think of a subject I have always been curious about: WTF are Linux guys even needing a decent GPU for, anyway? It isn't like you are going to game on the thing, and nearly all the GPU based video acceleration beyond Mpeg2 is proprietary voodoo that you are lucky to get working in Windows, much less Linux. And if you are wanting to play one of the few Linux games you can play it on several generations old hardware, which should be well supported by now.

      So WTF ARE you going to do with relatively new 3d hardware, anyway? Not trying to troll here, I'm just seriously curious. I have never understood the point of having decent 3d hardware on Linux as its main purpose....doesn't actually run on Linux. It would be like me buying a bunch of Apple software and then complaining that it don't run under Windows. These cards are built for Windows gaming, that is their whole fricking point, and where 99.999% of the money they make on these things comes from. So frankly the fact that ATI and Nvidia have given Linux users the amount of support that they have is amazing, considering that they can't use the card for what it was actually intended for, not really. So WTF are you actually wanting to use relatively new 3d hardware FOR anyway?

      --
      ACs don't waste your time replying, your posts are never seen by me.
    39. Re:End of an era by turgid · · Score: 1

      And of course, I want the x86 ISA to die. Having a 64-bit CPU still boot in 16-bit compatibility mode is just madness.

      Hear hear. Mind you, it only takes a few instructions to make the jump to (at least) 32-bit protected-mode.

    40. Re:End of an era by heson · · Score: 1

      The only thing software raid needs to make hwraid obsolete is reliable nvram (reliable wrt power loss)

    41. Re:End of an era by Anonymous Coward · · Score: 0

      Blender, Maya, custom tools for a studio, research, education, industry specific applications...

    42. Re:End of an era by LWATCDR · · Score: 1

      Good point about the raid but hey if you can get it off the CPU why not? Many servers have at least a low end gpu on the motherboard and they spend most of their life just sitting idle.
      I really do see encryption being the biggest use for gpus. I wonder if compression would be another good use. Maybe for something like a VOIP server.
       

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    43. Re:End of an era by hairyfeet · · Score: 1

      Then you should be using the Nvidia Quadro or ATI FireGL and NOT the gamer hardware, which is frankly designed for FPS above all, including accuracy. The consumer cards are only really built for gaming, which frankly is Windows. COD4, Crysis, etc is what those cards are designed for NOT actually getting any work done on them.

      So if you ARE using the consumer card for actual work, which is basically screwing the card manufacturers out of cash by jury rigging a card to do what it isn't really built to do, are you really surprised that the drivers kinda suck? I'm sure their workstation drivers are much better, sense they are designed to actually get work done on Linux. That doesn't change the fact that the bread and butter for these companies, where 99.999% of their money is coming from, is NOT Linux. And let us be honest here: Linux gaming is even less of a blip than Mac gaming.

      So you want a card designed to run DirecX games in Windows, which is where the money is, to do a job that it wasn't designed for AND you want the company to support you in those efforts, when they have a more expensive card that was actually BUILT for that purpose, but you just don't want to spend the cash. What's wrong with this picture? If you want to run 3d production on Linux then you should step up and buy a card that was designed for the task. I mean I could probably pull a boat with a Ford Escort, that doesn't mean I should nor should I expect Ford to help me out with that when they have the F150 built for that particular task. It just seems a little nuts to me to expect good drivers for a DirectX gaming card in an environment that wasn't built to support the main purpose the card was designed for.

      --
      ACs don't waste your time replying, your posts are never seen by me.
  8. Structural engineering welcomes this. by GreatBunzinni · · Score: 4, Interesting

    As a structural engineering in training who is starting to cut his teeth in writing structural analysis software, these are truly interesting times in the personal computer world. Technologies like CUDA, OpenCL and maybe also Larrabee are making it possible to simply place in any engineer's desk a system capable of analysing complex structures practically instantaneously. Moreover, it will also push the boundaries of that sort of software beyond, making it possible to, for example, modeling composite materials such as reinforced concrete through the plastic limit, a task that involves simulating random cracks through a structure in order to get the value of the lowest supported load and that, with today's personal computers, takes hours just to run the test on a simple simply supported, single span beam.

    So, to put this in perspective, this sort of technology will end up making it possible for construction projects to be both cheaper, safer and take less time to finish, all in exchange of a couple hundred dollars on hardware that a while back was intended for playing games. Good times.

    --
    Slashdot, fix your code or at least hire someone who is competent at it to do it for you.
    1. Re:Structural engineering welcomes this. by Anonymous Coward · · Score: 0

      Isn't the value of the lowest supported load zero? Why would it take hours to compute that?

    2. Re:Structural engineering welcomes this. by Anonymous Coward · · Score: 2, Insightful

      As a seasoned structural engineer (and PhD in numerical analysis), I hate to say this, but this is partly wishful thinking. Even an infinitely powerful computer won't remove some of the fundamental mathematical problems in numerical simulations. I will not start a technical discussion here, but just take some time to learn about condition numbers, for instance. Or about the real quality of 3D plasticity models for concrete, and the incredibly difficult task of designing and driving experiments for measuring them. Etc.

    3. Re:Structural engineering welcomes this. by GreatBunzinni · · Score: 2, Informative

      When performing limit analysis, the lowest supported load calculated through the plastic limit (see limit analysis' upper bound theorem) is the lowest possible load that causes the structure to collapse. Then, if we compare it with the static limit of said structure (see limit analysis' lower bound theorem) we can pinpoint the exact resistance to failure of a structure and, from there, optimize it and make it safer. Which is a nice thing to do in terms of safety and cost.

      --
      Slashdot, fix your code or at least hire someone who is competent at it to do it for you.
    4. Re:Structural engineering welcomes this. by Anonymous Coward · · Score: 1, Insightful

      "Math is hard" - Barbie

    5. Re:Structural engineering welcomes this. by MichaelSmith · · Score: 1

      modeling composite materials such as reinforced concrete through the plastic limit

      I wonder if that software could also improve animation, by making solid objects which look as if they actually have weight. Too many avatars seem to be hovering just above the ground because you don't see the forces being transmitted through their bodies.

    6. Re:Structural engineering welcomes this. by Tenebrousedge · · Score: 2, Interesting

      That's a problem with the animator. You don't need complicated software to make good animation--Toy Story should be sufficient evidence of that. You just need talent. Less and less talent these days, actually: if you're playing a game where the avatars are floating, it's because the designers don't give a^H^H^H^H^H^H^H care enough to simulate motion properly.

      As an aside, realism is frequently not a goal in animation. You tend to run up against the uncanny valley: all the characters look like zombies. Realism is what made "A Scanner Darkly" so painful to watch, especially as contrasted to "Waking life".

      I think Larrabee has somewhat more potential to improve ray tracing. Lighting in games these days seems like layers of, well, kludges. The code works, and it's fast, but it's an ugly, ugly solution.

      --
      Those who advocate genocide deserve every protection afforded by law, and none afforded by common human decency.
    7. Re:Structural engineering welcomes this. by Hurricane78 · · Score: 1

      Hey, "A Scanner Darkly" was not painful to watch. Not for anyone except you. ^^
      I have seen it with many people, and most of them liked it. Some of them did find it a bit slow/boring. But nobody found it to be painful.

      So if you always presume you are talking just about your views, then I apologize. But if not, please stop stop assuming everybody has your point of view. :)
      Thank you. :)

      --
      Any sufficiently advanced intelligence is indistinguishable from stupidity.
    8. Re:Structural engineering welcomes this. by Anonymous Coward · · Score: 0

      That's a different situation.

      With an animation like 'toy story' they can tweak each pose till it looks right. And it only needs look right from one camera angle. There does not have to be any physics involved at all.

      With a computer game, the poses have to be generated on the fly, and respond in real time to the environment.

    9. Re:Structural engineering welcomes this. by Anonymous Coward · · Score: 0

      Realism is what made "A Scanner Darkly" so painful to watch

      Linlaker wasn't going for realism, the subject matter of the book and the fact he used rotoscoping should make that clear. The drug featured in the story is called "substance D" or "death" so your zombie analogy is quite accurate, although hardly an unintended side effect. The only thing that made Scanner Darkly a difficult film to watch was the hokey, bastardized ending.

      With only a handful of exceptions, I see artifice increasingly taking the back seat whenever there's digital art involved in a production. I don't see Larrabee changing the "factory" mentality and I'm not in the slightest interested in the potential of Larrabee for graphics or audio. Fast ubiqutous crypto is the considerably more interesting use case -- a scramble suit for all.

    10. Re:Structural engineering welcomes this. by Tenebrousedge · · Score: 0, Troll

      Your post is saccharine, condescending, vapid, and irrelevant to the subject at hand.

      If you have an opinion on anything I was actually discussing, please share it. As an example, you could explore the visual differences between the two contrasted films, the science behind the 'uncanny valley, or perhaps discuss the merits of rotoscoping in general.

      "I liked the movie. Some of my friends liked the movie."

      The inanity is stupefying.

      --
      Those who advocate genocide deserve every protection afforded by law, and none afforded by common human decency.
    11. Re:Structural engineering welcomes this. by LingNoi · · Score: 1

      You tend to run up against the uncanny valley:

      I completely disagree that uncanny valley is a problem. It's simply not realistic enough and it's all down to the movement of said realistic character.

      What are we? Meat bags. We're walking bags of water, yet when ultra realistic game characters are animated they're done so in a barbie dole fashion without taking into account the movement of mass on what they are simulating.

      If game characters factored this in and did it right "uncanny valley" wouldn't exist, yet it does because game developers have failed to see why their game characters look unreal even though they're modelled perfectly and have given up.

      Watch this to get a better idea of what I am talking about.

    12. Re:Structural engineering welcomes this. by m50d · · Score: 1

      It's condescending, and deservedly so, because you made an unsupported and likely false assertion. If you're going to claim something is "painful to watch", anyone interested in serious discussion should realise that's going to be contentious; you'd better give evidence to support it.

      --
      I am trolling
    13. Re:Structural engineering welcomes this. by maxume · · Score: 1

      There is science behind the uncanny valley?

      --
      Nerd rage is the funniest rage.
    14. Re:Structural engineering welcomes this. by LWATCDR · · Score: 1

      Kind of scary if you ask me. It sounds like you are trying to use simulation to reduce the margin of error you build into a structure. While that can be a good thing it isn't always. It puts a lot higher demand on quality control at the building site which is often outside the control of the engineer.
      Kind of reminds of some really nice homebuild aircraft in the 80s. They used very low drag laminar flow airfoils. They where very fast and worked well. Soon some where falling out of the sky on take off. They where flying from grass fields. If the grass was freshly cut and happened to stick on the leading edge of the wing that nice airfoil became a nightmare.
      It was soon found that even bugs could trip the airflow from laminar to none laminar. It was something that the computer simulations and even the wind tunnel tests didn't notice. After all who puts dead bugs on their pretty model.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    15. Re:Structural engineering welcomes this. by mgblst · · Score: 1

      You have had this sort of power for years, just not on one machine. Set up a small cluster in your office, using unused machines, or unused cycles on people machines. Look at Condor, easy to setup, very powerful. You now have access to a Mainframe like pool of computers.

    16. Re:Structural engineering welcomes this. by Pseudonym · · Score: 2, Informative

      As a structural engineering in training who is starting to cut his teeth in writing structural analysis software, these are truly interesting times in the personal computer world.

      There's one drawback to the current crop of CPUs, though. More cores per die means less cache per core. So depending on what you're doing, this could actually degrade performance (all other things being equal) over older SMP machines.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    17. Re:Structural engineering welcomes this. by GreatBunzinni · · Score: 1

      That may be true but I believe that a SMP machine with 4 or 8 fast cores with plenty of cache, when performing tasks involving linear algebra with large matrices (or a whole lot of small-is ones), cannot compete with CUDA/OpenCL's hundreds of slower, cache-less cores, not only in performance (linear algebra on a large-ish matrix or lots of smallish matrices is one of those embarrassingly parallel problems) but also in terms of cold, hard cash (over 200 euros for the cheapest xeon, nearly 300 euros for a Radeon HD 4890 with it's 800 "stream processing units".

      --
      Slashdot, fix your code or at least hire someone who is competent at it to do it for you.
    18. Re:Structural engineering welcomes this. by Pseudonym · · Score: 1

      Like I said, it depends on what you're doing and on all other things being equal.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
  9. What? Is 15GB that much for a base OS install? by symbolset · · Score: 4, Informative

    Your post can be summarized as: Intel Giveth; Microsoft taketh away. That's been the formula for far too long.

    And that period is almost over.

    --
    Help stamp out iliturcy.
  10. Isn't it high time for a 80x86 cleanup? by master_p · · Score: 0, Troll

    There are lots of instructions and other craft inside 80x86 processors that occupy silicon that is never used. A clean break from 80x86 is needed. Legacy 80x86 code can run perfectly in emulation (and need not be slow, using JIT techniques).

    What I like most about Larrabee is the scatter-gather operations. One major problem in vectorized architectures is how to load the vectors with data coming from multiple sources. the Larrabee ISA solves this neatly by allowing vectors to be loaded from different sources in hardware and in parallel, thus making loading/storing vectors a very fast operation.

    The programming languages that will benefit from Larrabee though will not be C/C++. It will be Fortran and the purely functional programming languages. Unless C/C++ has some extensions to deal with the pointer aliasing issue, that is.

    1. Re:Isn't it high time for a 80x86 cleanup? by ShakaUVM · · Score: 3, Insightful

      The programming languages that will benefit from Larrabee though will not be C/C++. It will be Fortran and the purely functional programming languages. Unless C/C++ has some extensions to deal with the pointer aliasing issue, that is.

      Intel has a lot of smart people in their compilers group, and they've done stuff like this before in different times in the past. I wouldn't at all be surprised if they released compiler extensions to allow quick loading of data into the processing vectors.

    2. Re:Isn't it high time for a 80x86 cleanup? by joib · · Score: 5, Informative

      There are lots of instructions and other craft inside 80x86 processors that occupy silicon that is never used. A clean break from 80x86 is needed. Legacy 80x86 code can run perfectly in emulation (and need not be slow, using JIT techniques).

      All the legacy junk takes up a pretty small fraction of the area. IIRC on a modern x86 CPU like Core2 or AMD Opteron, it's somewhere around 5%. Most of the core is functional units, register files, and OoO logic. For a simple in-order core like Larrabee the x86 penalty might be somewhat bigger, but OTOH Larrabee has a monster vector unit taking up space as well.

      What I like most about Larrabee is the scatter-gather operations. One major problem in vectorized architectures is how to load the vectors with data coming from multiple sources. the Larrabee ISA solves this neatly by allowing vectors to be loaded from different sources in hardware and in parallel, thus making loading/storing vectors a very fast operation.

      Yes, I agree. Scatter/gather is one of the main reason why vector supercomputers do very well on some applications. E.g. scatter/gather allows sparse matrix operations to be vectorized, and allows the CPU to keep a massive number of memory operations in flight at the same time, whereas sparse matrix ops tend to spend their time waiting on memory latency when you have just the usual scalar memory ops.

      The programming languages that will benefit from Larrabee though will not be C/C++. It will be Fortran and the purely functional programming languages. Unless C/C++ has some extensions to deal with the pointer aliasing issue, that is.

      There is the "restrict" keyword in C99 precisely for this reason. It's not in C++ but most compilers support it in one way or another (__restrict, #pragma noalias or whatever). That being said, I'd imagine something like OpenCL would be a more suitable language for programming Larrabee than either C, C++ or Fortran. Functional lnaguages are promising for this as you say, of course, but it remains to be seen if they manage to break out of their academic ivory towers this time around.

    3. Re:Isn't it high time for a 80x86 cleanup? by serviscope_minor · · Score: 4, Insightful

      The programming languages that will benefit from Larrabee though will not be C/C++.

      Awwwww :-(

      It will be Fortran and the purely functional programming languages. Unless C/C++ has some extensions to deal with the pointer aliasing issue, that is.

      Oh. You mean like restrict which has been in the C standard for 10 years?

      GCC supports it for C++ too. I'd be suprised if ICC and VS didn't support it for C++ too.

      --
      SJW n. One who posts facts.
    4. Re:Isn't it high time for a 80x86 cleanup? by serviscope_minor · · Score: 1

      Seriously, can someone tell my mhy my post was a troll? The GP was rererring to the lack of a feature that C has had for 10 years. The C99 standard came out in 1999 and had the restrict keyword in it. This allows for optimizations on a par with FORTRAN since it provides the same guarantees.

      I know it's fashionable to hate C++ and C overe here these days. Perhaps that's the problem.

      --
      SJW n. One who posts facts.
    5. Re:Isn't it high time for a 80x86 cleanup? by ThePhilips · · Score: 1

      There are lots of instructions and other craft inside 80x86 processors that occupy silicon that is never used.

      Rarely used instructions are not need to be optimized - then they would take very little of transistors to implement. Only heavily used instructions needs to be optimized.

      The programming languages that will benefit from Larrabee though will not be C/C++. It will be Fortran and the purely functional programming languages.

      I hope you do understand that Fortran was fast because programs written in it were also simple. Modern programs combine lots and lots of math, memory and I/O operations. You can't easily parallelize that. Even now it can be already perfectly parallelized in C/C++, yet resulting software is quite complicated to manage and maintain.

      It sounds stupid, but Sun actually already optimized their Java's JIT for SPART T1/T2 which are highly multithreaded CPUs.

      --
      All hope abandon ye who enter here.
    6. Re:Isn't it high time for a 80x86 cleanup? by rcallan · · Score: 1


      I think you're preaching to the choir here on killing x86. The x86 ops get translated to RISC ops anyway. What I wonder is why they haven't attempted to release two versions: an x86 version, and a stripped down RISC version without the x86 decoder. Obviously this would be monumental task at all levels of the design, but it would seem they could get similar performance on the RISC version without as much effort as needed for the x86 version since that overhead is removed. I would guess(and hope) that most of their design effort goes into optimizing the design in the RISC world after the instructions are translated anyway. This will never happen though because windows == x86 only. Being able to compile most of the needed applications from source gives hardware designers the freedom to shed legacy interfaces every 5 years instead of every 30. It would be a glorious future if hardware producers started realizing that open source software == greater hardware design flexibility == better performance/cost. Hopefully this is already happening with the shift from x86 to ARM on netbooks.

    7. Re:Isn't it high time for a 80x86 cleanup? by makomk · · Score: 1

      According to TFA, the scatter/gather instructions are actually pseudo-instructions handled by the assembler on the current version of Larrabee.

    8. Re:Isn't it high time for a 80x86 cleanup? by gnasher719 · · Score: 2, Insightful

      What I wonder is why they haven't attempted to release two versions: an x86 version, and a stripped down RISC version without the x86 decoder.

      If you looked at what Intel has been doing recently, the RISC code that x86 is translated to has been slowly evolving. For example, sequences of compare + conditional branch become a single micro op. Instructions manipulating the stack are often combined or not executed at all. So what is the perfect RISC instruction set today isn't the perfect RISC instruction set tomorrow. And Intel's RISC instruction set would likely be quite different from AMD's.

    9. Re:Isn't it high time for a 80x86 cleanup? by Anonymous Coward · · Score: 0

      You're right. You were hit by a drooling moron moderator. I counteracted his damage by giving you an insightful. Hopefully some meta moderator will catch up to the moron, but I don't have too much hope.

      It is mostly people not bright enough to master and use it effectively who hate C++.

    10. Re:Isn't it high time for a 80x86 cleanup? by Anonymous Coward · · Score: 0

      I don't see a reason for the legacy junk either, as emulation will work fine; however, given that the 8086 and 80286 were implemented using transistors that were several orders of magnitude fewer in number, it's doubtful you gain much by leaving them out. However, every time Intel has tried making a "clean break", they've stumbled. How do you go about booting a computer in protected mode, anyway?

    11. Re:Isn't it high time for a 80x86 cleanup? by speedtux · · Score: 1

      Oh. You mean like restrict which has been in the C standard for 10 years?

      Sorry, but "restrict" is not sufficient. Fortran has built in support for vectorization, parallelization, and efficient dynamic multi-dimensional arrays.

    12. Re:Isn't it high time for a 80x86 cleanup? by Agripa · · Score: 1

      I have often wondered if a more orthogonal superset of the existing instruction set with a clean instruction encoding would be better. The JIT compiler would be dead simple since it would only have to translate. AMD apparently had that option when they designed x86-64 but decided full compatibility was more important.

    13. Re:Isn't it high time for a 80x86 cleanup? by joib · · Score: 1


      Sorry, but "restrict" is not sufficient.

      For the most part, it is. There are some other minor things, but restrict closes the majority of the performance gap between C and Fortran. Oh yes, and C99 has some truly retarded pedantery wrt. complex arithmetic, so you might need to use some compiler option to get around that.


      Fortran has built in support for vectorization, parallelization, and efficient dynamic multi-dimensional arrays.

      If you're thinking of FORALL and the other stuff imported from HPF, well, there's a reason HPF died and parallel Fortran applications use MPI and/or OpenMP just like C/C++. Unfortunately the semantics of FORALL and array expressions makes them not much simpler to analyze for dependencies than normal loops, and the penalty for failing is higher.

      However, the arrays in F90+ are very nice to program with; compared to those, programming with arrays in C is like poking your eyes out with a fork. But that's mostly a convenience feature, as such it doesn't improve performance (as long as you don't do your multidimensional C arrays in the popular but suboptimal array-of-arrays style).

      If you have an irrational hatred for Fortran, at least do yourself the favor of using C++ where you can encapsulate multidimensional arrays in a class with overloaded operators. There's a bunch of such high-performance implementations around, such as Eigen, blitz, boost::multiarray and so forth, so no need to reinvent the wheel.

    14. Re:Isn't it high time for a 80x86 cleanup? by David+Greene · · Score: 1

      The "restrict" keyword gets you most of the Fortran advantages in C, though a lot of people misunderstand what it means.

      The C++ folks are workinmg on a memory model specification intended to open up parallelism. I haven't kept up on the details of that, though.

      --

    15. Re:Isn't it high time for a 80x86 cleanup? by David+Greene · · Score: 1

      OpenCL is a big ugly hack meant to provide a standard API for legacy GPUs. It's totally inappropriate for something like Larrabee, which is much more general purpose. A good vectorizing compiler will be able to make use of most of Larrabee's features directly. You'll be able to write code in standard languages, with an eye toward writing in a way that makes vectorization possible. While this does require that programmers get trained to understand things like loop dependence, it doesn't require learning a whole new language and API.

      --

    16. Re:Isn't it high time for a 80x86 cleanup? by mlts · · Score: 1

      You could put a hypervisor on a lower level for this functionality, but that brings its own can of worms, including figuring out how to pass communication from hardware devices to the operating systems. For example, which OS should set the proper settings on a USB toaster?

    17. Re:Isn't it high time for a 80x86 cleanup? by speedtux · · Score: 1

      For the most part, it is.

      No, for the most part it isn't.

      If you're thinking of FORALL and the other stuff imported from HPF, well, there's a reason HPF died and parallel Fortran applications use MPI and/or OpenMP just like C/C++.

      Yes, the reason is that FORALL and parallel matrix operations vectorized code, while OpenMP is for multicore code.

      There's a bunch of such high-performance implementations around, such as Eigen, blitz, boost::multiarray and so forth, so no need to reinvent the wheel.

      All those libraries are excessively complex, hard to use, and have limited functionality compared to Fortran arrays. They often also don't perform well. And since C++ does not support "restrict", none of them can even communicate pointer restrictions to the compiler.

      So, with the "restrict" keyword and a lot of effort, you can get some parallelization out of a C compiler; for C++, all you get is no restrict keyword and complicated libraries. Altogether, neither C nor C++ are good choices for numerical programming.

    18. Re:Isn't it high time for a 80x86 cleanup? by master_p · · Score: 1

      Well, it's correct that 'restrict' is part of the C99 standard. But:

      a) it is not supported by all C compilers.
      b) it is not in the C++0x standard.
      c) people don't use it because it is cumbersome to do so (having to type 'restrict' at each and every turn...)

      On the other hand, pure functional languages are in 'restrict' mode by default, and so is FORTRAN.

    19. Re:Isn't it high time for a 80x86 cleanup? by joib · · Score: 1

      No, for the most part it isn't.

      Yes, it is. Look, C and C++ weren't designed with parallelism in mind, but neither was Fortran. The only parallel dialect of Fortran that saw some use was HPF, and that was a failure (see below). Back in the days of yore when vector supercomputers and dinosaurs roamed the earth, Fortran had a big advantage over C. This was largely due to the semantics of C which didn't have restrict at the time, as well as the relative immaturity of C compilers at that point. But all that code that the legendary Cray vectorizing Fortran compiler compiled was just standard F77, which had no FORALL nor array expressions. Vectorization was done on normal DO loops. Just like a modern C or C++ compiler can vectorize for-loops, possibly with the aid of restrict (or __restrict, #pragma noalias or whatever the extension is called in your C++ compiler)

      "If you're thinking of FORALL and the other stuff imported from HPF, well, there's a reason HPF died and parallel Fortran applications use MPI and/or OpenMP just like C/C++." Yes, the reason is that FORALL and parallel matrix operations vectorized code, while OpenMP is for multicore code.

      HPF is a failure in many respects. Fundamentally it died because on distributed memory machines it's performance was poor compared to MPI. The parts of HPF that were included into F95, most notably FORALL, are widely considered to be mistakes. As I mentioned in my previous point, the semantics of FORALL make analyzing it no simpler than analyzing the equivalent DO loop, and the performance penalty for failing is higher. Hence it's always better to just use a normal DO loop. In fact F2008 is adding a "DO CONCURRENT" loop so that the programmer can explicitly tell the compiler there are no loop-carried dependencies; this is a direct response to the difficulty of properly optimizing FORALL.

      As for F90 array expressions, they have similar optimization issues as FORALL, so if one is really concerned with maximum performance it's safer to use DO loops. The real value of the array expressions is that they make the code simpler and shorter, not that they provide better performance than normal loops.

    20. Re:Isn't it high time for a 80x86 cleanup? by speedtux · · Score: 1

      Come on, stop putting up straw men. Fact is, C has "restrict" but no usable multidimensional arrays, and C++ doesn't have "restrict" at all. Fortran has restricted pointers and full multidimensional arrays. What more is there to say?

      As for the other points your raise, your views are rather simplistic and narrow, but there's no point in debating that with you further.

    21. Re:Isn't it high time for a 80x86 cleanup? by shutdown+-p+now · · Score: 1

      GCC supports it for C++ too. I'd be suprised if ICC and VS didn't support it for C++ too.

      If by "VS" you mean Visual C++, then it doesn't support C99 at all - and, consequently, doesn't have "restrict" in either C or C++ mode.

      It does have __restrict and __declspec(restrict) though, in either mode, but with slightly different semantics (core optimizations enabled by it are still the same, of course).

  11. Re:What? Is 15GB that much for a base OS install? by Anonymous Coward · · Score: 0

    It is over already, Gates retired, remember?

  12. If Intel are smart they will mix Core and Larabee by jonwil · · Score: 3, Insightful

    If Intel are smart they will release a chip containing one core (or 2 cores) from some kind of lower-power Core design and a pile of Larabee cores on the one die along with a memory controler and some circuits to produce the actual video output to feed to the LCD controler, DVI/HDMI encoder, TV encoder or whatever. Then do a second chip containing a WiFi chip, audio, SATA and USB (and whatever else one needs in a chipset). Would make the PERFECT 2-chip solution for netbooks if combined with a good OpenGL stack running on the Larabee cores (which Intel are talking about already).

    Such a 2-chip solution would also work for things like media set top boxes and PVRs (if combined with a Larabee solution for encoding and decoding MPEG video). PVRs would just need 1 or 2 of whatever is being used in the current crop of digital set top boxes to decode the video.

    As for the comment that people will need to understand how to best program Larabee to get the most out of it, most of the time they will just be using a stack provided by Intel (e.g. an OpenGL stack or a MPEG decoding stack). Plus, its highly likely that compilers will start supporting Larabee (Intel's own compiler for one if nothing else).

  13. Re:An architecture named after a Get Smart charact by Anonymous Coward · · Score: 0

    I don't think so.

  14. Re:If Intel are smart they will mix Core and Larab by SpazmodeusG · · Score: 1

    I was thinking that. The Larrabees vector unit looks like it could just replace SSE entirely.

    Which does raise a question - will Intel keep SSE if it adds in the Larrabee vector unit as yet another legacy feature? I'm guessing it will (sigh).

  15. Not really x86 by makomk · · Score: 0, Troll

    This isn't really x86, in my opinion; it's x86 with a separate set of very obviously graphics-oriented instructions bolted on top. Since getting decent performance will require using the new instructions and a new programming model almost exclusively, what's the point of the x86 bit? Well, other than marketing reasons and to prevent companies like NVidia releasing their own version, of course...

    1. Re:Not really x86 by Hal_Porter · · Score: 1

      I think the point is that in the long run you will have one Larrabee like chip in a desktop that does both the CPU and GPU functions. And in a server that same chip could manage a huge thread pool, which is the best way to do server applications IMO.

      --
      echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
    2. Re:Not really x86 by Anonymous Coward · · Score: 0

      It's x86 because that's what Intel is already good at. Also, that's the whole point. Nvidia and ATI have vector processing chips, but that's all they're good for -- they're useless for general purpose programs because all they do is the vector part. Larrabee is great because it can run GP code as well as vector code.

      dom

    3. Re:Not really x86 by makomk · · Score: 2, Insightful

      Perhaps. As it stands, though, I don't think Larrabee can run all standard x86 code, since it doesn't support legacy instructions. Plus, even if it did, the performance would suck. For desktop use, it probably makes more sense to have some real x86 cores and a bunch of simpler graphics cores that don't have to be x86. To get full benefit from Larrabee, the code has to be threaded anyhow, so there's not so much point being able to run it on the same core as the standard x86 code.

    4. Re:Not really x86 by julesh · · Score: 2, Interesting

      This isn't really x86, in my opinion; it's x86 with a separate set of very obviously graphics-oriented instructions bolted on top. Since getting decent performance will require using the new instructions and a new programming model almost exclusively, what's the point of the x86 bit?

      The point is that there's stuff those graphics-oriented instructions are really not very good at, like indirect memory referencing and branching logic, both of which x86 excels at handling. Now, that kind of workload isn't common on GPUs _at the moment_, but both of those are common operations, for example, in ray tracing, so you may see them become more important over the next few years. What Intel are doing here is defining the GPU architecture for the next decade, and it's one that allows more complex algorithms to be implemented than can easily be done using the specialized stream processing systems we have at the moment.

      The other point behind the x86 bit is that not only did Intel alrady have core designs that implemented it (Larrabee simply has the new registers & instructions bolted on to an existing low-power Pentium-class core) thus enabling faster time to market than if they'd developed entirely new hardware, they also have a massive amount of software support for the architecture, including one of the best optimizing C++ compilers there is. A new ISA would have required a new compiler, thus further complicating the project. As it is, only extensions to their existing compiler have been necessary.

  16. Re:If Intel are smart they will mix Core and Larab by seeker_1us · · Score: 2, Insightful

    I don't think we will see this in notebooks for a while. We need to wait and see what the real product looks like (Intel hasn't released any specs), but Google for Larrabee and 300W and you will see the scuttlebut is that this chip will draw very large amounts of power.

  17. Re:If Intel are smart they will mix Core and Larab by joib · · Score: 1

    Yeah, most x86_64 ABI's use SSE for scalar floating point, so it's too late to remove it. But hey, at least SSE is an improvement over x87.

  18. Hackers. by Anonymous Coward · · Score: 0

    Was the best movie of all time.

    1. Re:Hackers. by Anonymous Coward · · Score: 0

      that's only because the actors weren't mexican.

  19. SIMD... is it the right way to go? by loubs001 · · Score: 1

    Im skeptical about the future of SIMD and even instruction level parallelism in general for massively parallel processors. The problem with this is that in order to get maximum utiliasation of all of the ALUs in the processor, you have to fill the entire vector with data that you can perform the SAME operation on. This means its up to the programmer or compiler to write highly vectorizable code. If you cant fill these huge 512-bit vectors, arithmetic units are going to be idle. nvidia realised this years ago, and so since the G80 their architectures have been scalar. Without vectors you can run alot more scalar threads while keeping ALL the units busy all the time. Win Win. I'll need some serious convincing if I'm to believe Intel is a real threat to nvidia in this space, especially for GPGPU.

    1. Re:SIMD... is it the right way to go? by Anonymous Coward · · Score: 2, Informative

      nVidia G80 is scalar in the sense it's not VLIW (like ATI is), but it still has 32-wide SIMD. (Likely to go to 16 in next generations). 32*16 is actually 512 bits too.

      Doing a truly scalar architecture would have an enormous cost in instruction caches - you'd need to move as many instructions as data around the chip, and that won't be cheap. So SIMD is going to be around for a while.

      Nice try, more research next time.

    2. Re:SIMD... is it the right way to go? by Anonymous Coward · · Score: 0

      n order to get maximum utiliasation of all of the ALUs in the processor, you have to fill the entire vector with data that you can perform the SAME operation on

      That's what is bothering me too, since I did not notice any explicit memory hierarchy, just a bunch of caches. It seems this incarnation of Larrabee could be quite latency sensitive.

    3. Re:SIMD... is it the right way to go? by makomk · · Score: 1

      Ding ding. Mod parent up. The real difference between current NVidia and ATI is that NVidia has sets of SIMD processors executing the same scalar operations on multiple pieces of data at once, whereas ATI has them executing the same VLIW instruction (up to 5 operations) on multiple pieces of data.

      As I recall, ATI's SIMD cores work on 16 pieces of data at once (16 pixels, for example) whereas the internet information suggests that NVidia's work on 32.

      Larrabee really isn't that revolutionary, by the way; it's more a generalisation of what other graphics cards already do.

    4. Re:SIMD... is it the right way to go? by Rockoon · · Score: 1

      ILP is huge for both x86 and RISC based machines. There is no good reason NOT to execute the next instruction if its independent of the one currently being executed. It helps to have a compiler which is aware of ILP so that it minimizes sequential dependency chains within the machine code.

      OoO on the other hand isnt that big of a deal, because as long as you have a compiler aware of ILP, the CPU really doesnt have to manage this task 'cept in cases of very mismatched instruction latencies. Most instructions have a 3 cycle latency on a modern processor (1 cycle startup, 1 cycle execute, 1 cycle retire) .. some isntructions deviate considerably (division instructions, for instance)

      Larrabee itself is sort of a hybrid of a pure SIMD and pure MIMD approach, with 16-floats per register (512 bit) and 8 to 16 cores, it is pretty much a balance between SIMD and MIMD.

      Note that nobody is making pure MIMD processors that even comes close to competing with the raw processing ability of SIMD, where the only non-SIMD setups that can compete are large computer clusters costing way more than Larrabee will.

      This is the same road the video card makers are taking, but they lack some of the advantages that Intel has. Nobody has better fab's than Intel, and only Intel or AMD can eventualy integrate this thing on-die with a regular x86-based desktop CPU, which is why nVidia is in very serious trouble.

      --
      "His name was James Damore."
  20. "GPU class rendering in software" by serviscope_minor · · Score: 1

    The claim that this is the first time you can get "GPU class rendering in software"... with nothing more than a pixel sampler to help is somewhat dubious. Modern GPUs are, after all a bunch of stream processors with a pixel sampler. So, really, modern GPU graphics is all in software except the sampling.

    Oh, hey and anyone here remember the voodoo? That was a big (for the sime) sampler driven by an x86 CPU. Sound familiar?

    Sarcasm aside, I want one. The peak performance is high, and the programming model is well known. Also, Linux support is likely to be excellent.

    --
    SJW n. One who posts facts.
    1. Re:"GPU class rendering in software" by tepples · · Score: 1

      The claim that this is the first time you can get "GPU class rendering in software"... with nothing more than a pixel sampler to help is somewhat dubious. Modern GPUs are, after all a bunch of stream processors with a pixel sampler. So, really, modern GPU graphics is all in software except the sampling.

      As I understand it, the key difference that makes the software running on Larrabee more like traditional "software" than NV's or ATI's offerings is that Intel is exposing these stream processors' instruction sets to let compiler writers compete on writing shader compilers.

    2. Re:"GPU class rendering in software" by Anonymous Coward · · Score: 0

      ATI has been exposing their instruction set to everyone since R600 (it's somewhere on x.org website even). Nobody ever cared.

      You got suckered :-) (oh, pardon, 'marketed to').

    3. Re:"GPU class rendering in software" by tepples · · Score: 1

      ATI has been exposing their instruction set to everyone since R600

      In that case, "software" might refer to Larrabee's use of an ISA so closely related to one that has already had plenty of research into optimization. Or it could mean an ISA that isn't so limited to the kind of processing that occurs in the sorts of vertex and pixel shaders that we have seen up until now.

  21. Transcendental functions? by julesh · · Score: 2, Interesting

    Articles states that there's hardware support for transcendental functions, but the list of instructions doesn't include any. Anyone know what is/isn't supported in this line?

    1. Re:Transcendental functions? by gnasher719 · · Score: 4, Informative

      Articles states that there's hardware support for transcendental functions, but the list of instructions doesn't include any. Anyone know what is/isn't supported in this line?

      "Hardware support" doesn't mean "fully implemented in hardware".

      What hardware support do you need for transcendental functions?
      1. Bit fiddling operations to extract exponents from floating point numbers. Check. 2. Fused multiply-add for fast polynomial evaluation. Check. 3. Scatter/gather operations to use coefficients of different polynomials depending on the range of the operand. Check.

    2. Re:Transcendental functions? by mikael · · Score: 1

      I would guess that they would be the same transcendental functions supported by the other shader languages; Cg, GLSL and Renderman; sine, cosine, tan, asin, acos, atan, sinh, cosh, tanh, if not sincos as well. They are also going to need exp, log, exp2, log2, exp10 and log10. All of these will be required for statistical modeling of texture, 3D animation and image processing. Maybe they won't be vectorized, or maybe it will be possible to treat each 16-element vector as a matrix.

      --
      Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
    3. Re:Transcendental functions? by Rufus211 · · Score: 1

      From the C++ prototype guide, which is just the ISA made into a terribly complex C++ wrapper, they support these transcendental functions in the ISA:
      EXP2_PS - Exponential Base-2 of Float32 Vector
      LOG2_PS - Logarithm Base-2 of Float32 Vector
      RECIP_PS - Reciprocal of a Float32 Vector
      RSQRT_PS - Reciprocal of the Square Root of a Float32 Vector

      They also provide library functions that implement everything else you'd want (sin, cos, etc) in software, I assume using Newton-Raphson iteration.

  22. Re:If Intel are smart they will mix Core and Larab by Anonymous Coward · · Score: 0

    A larger, more complex superscalar chip core + several smaller, simpler, and bus connected fixed-function units is also the exact design of the Sony CELL processor.

  23. Where continue may fail with a nested loop by tepples · · Score: 4, Informative

    "Would you believe a GOTO statement and a couple of flags?"

    How about a while loop and a continue statement?

    In C, a continue breaks out of only one nested while or for loop. If you're in a triply nested loop, for example, you can't specify "break break continue" to break out of two nested loops and go to the next iteration of the outer loop. You have to break your loop up into multiple functions and eat a possible performance hit from calling a function in a loop. So if your profiler tells you the occasional goto is faster than a function call in a loop, there's still a place for a well-documented goto.

    C++ code can use exceptions to break out of a loop. But statically linking libsupc++'s exception support bloats your binary by roughly 64 KiB (tested on MinGW for x86 ISA and devkitARM for Thumb ISA). This can be a pain if your executable must load entirely into a tiny RAM dedicated to a core, as seen in the proverbial elevator controller, in multiplayer clients on the Game Boy Advance system (which run without a Game Pak present so they must fit into the 256 KiB RAM), or even in the Cell architecture (which gives 128 KiB to each DSP core).

    1. Re:Where continue may fail with a nested loop by jakykong · · Score: 1

      You don't have to use separate functions in C. C does have GOTO's, thank you very much :)

      The only note is that you can't jump to the end of a block. So if your loop is nested such that no loop has code at the end of it, then you might run into a problem. In that case, an empty function that does nothing (and could be optimized away by the compiler) would suffice as a line of code to allow you to jump.

      I'm not an expert, just a hobbyist (at the moment). But that's been my experience. I'd be happy if someone could tell me a better way (which doesn't involve a flag indicating that each loop is supposed to break!)

    2. Re:Where continue may fail with a nested loop by CarpetShark · · Score: 1

      FWIW, I believe setjmp/longjmp are the closest C equivalents to exceptions.

    3. Re:Where continue may fail with a nested loop by CarpetShark · · Score: 1

      Since you specifically asked about what I said above, I'll repeat it here. An alternative to breaks and gotos in C, which gets you functionality more like exceptions, is the setjmp.h header's setjmp (equivalent to except) and longjmp (equivalent to raise or throw).

      http://en.wikipedia.org/wiki/Longjmp#Exception_handling

    4. Re:Where continue may fail with a nested loop by grimw · · Score: 1

      No, you can just use set a boolean in your inner-loop that outer-loops use when you break from the inner-loop. Also, you can create inline functions if you really wanted to go that route.

      As for C++, 64 KB of bloat for exceptions is nothing compared to the performance cost of handling exceptions. For the architectures you mentioned, it's an issue, which is why C++ either isn't used or is used without exceptions. However, for the majority of systems, exception-handling performance trumps a boost in run-time size.

    5. Re:Where continue may fail with a nested loop by Briareos · · Score: 1

      The only note is that you can't jump to the end of a block.

      I'm pretty sure an empty statement (which is a statement, after all, and that's what's supposed to come right after a label) should do the trick:

      {
      ...
          goto foo;
      ...
      foo: ;
      }

      np: Autechre - Are Y Are We (WAP72 12")

      --

      "I'm not anti-anything, I'm anti-everything, it fits better." - Sole

    6. Re:Where continue may fail with a nested loop by Blakey+Rat · · Score: 1

      Or you could, you know, TAKE A FUCKING JOKE and not write long drawn-out replies to a funny post.

      Christ, there are some humorless people on this forum.

    7. Re:Where continue may fail with a nested loop by ibsteve2u · · Score: 1
      Re:

      This can be a pain if your executable must load entirely into a tiny RAM

      Now I know our nation has too sedentary of a lifestyle...we won't even break out the assembler lest we have to JMP up and break a sweat.

      --
      Orwell: "In a Time of Universal Deceit, telling the Truth is a Revolutionary Act"
    8. Re:Where continue may fail with a nested loop by tepples · · Score: 1

      No, you can just use set a boolean in your inner-loop that outer-loops use when you break from the inner-loop.

      For each variable you add, you increase the chance that the compiler will run out of registers and have to spill variables to the stack. Examining the intermediate assembly language files that your compiler generates will help you determine which variables have been spilled.

      Also, you can create inline functions if you really wanted to go that route.

      Which have no access to the other function's local variables unless you put them in a struct and pass the struct to the inline function. (The C standard itself specifies no "inner function" mechanism.) Then they're all spilled.

      For the architectures you mentioned, it's an issue, which is why C++ either isn't used or is used without exceptions.

      I just wanted to avoid the objections of C++ fanboys like StoneCypher. He claims that C++ language features not also in C have little overhead when compiled with the Green Hills compiler, which is unfortunately out of hobbyists' price ranges. That said: Use goto only when your profiler tells you it's better.

    9. Re:Where continue may fail with a nested loop by shutdown+-p+now · · Score: 1

      No, you can just use set a boolean in your inner-loop that outer-loops use when you break from the inner-loop.

      You can, but it's not at all more readable than a goto to a descriptive label.

      Contrary to what many people think, goto isn't evil by itself. It's easy to misuse, but breaking out of loops is precisely one of the cases when it's used properly. In fact, "break" and "continue" are goto in a very thin disguise - they merely let you avoid declaring the label explicitly, but otherwise are exactly the same thing.

    10. Re:Where continue may fail with a nested loop by sproctor · · Score: 1

      You posted that in response to a joke? Maxwell Smart always says decreasingly impressive things that are clearly untrue. Such as: Smart: Surrender, you'll never escape the dozen agents behind that door. Villain: There aren't any agents behind that door. Smart: Would you believe there's a vicious guard dog? Villain: No. Smart: How about a small puppy? (This is not a quote, I just made it up as an example.)

  24. Isn't that the "highest supported load"? by tepples · · Score: 1

    When performing limit analysis, the lowest supported load calculated through the plastic limit (see limit analysis' upper bound theorem) is the lowest possible load that causes the structure to collapse.

    I think Anonymous Coward was trying to say that the layman's term for this load amount is the "highest supported load" that doesn't cause collapse.

    1. Re:Isn't that the "highest supported load"? by GreatBunzinni · · Score: 1

      I see what you mean. Nonetheless, what I meant instead of "lowest supported load" was the lowest supported plastic load, which basically means the load that leads a certain section of the structural element to stop increasing it's resistance proportionally to the applied load (i.e., break).

      --
      Slashdot, fix your code or at least hire someone who is competent at it to do it for you.
  25. Re:If Intel are smart they will mix Core and Larab by smallfries · · Score: 2, Interesting

    Oddly enough your post ranks quite highly in that search. Drilling through the forums that show up reveal speculation that a 32-core Larrabee design will use 300W TDP, or roughly 10W per core. There doesn't seem to be any justification for that number although the Larrabee looks like Atom + stonking huge vector array. The Atom only uses 2W, it seems hard to believe that the 16-way vector array would use as much power for each FLOP as the entire Atom power budget to deliver that FLOP. Or perhaps it will, it's all just speculation at this point.

    So that 32-core processor would deliver 16x32 = 512 FLOP/clock peak. I would guess that they could deliver a low-power part clocked at 1GHz judging by the efficiency of Intel's floating point units across the whole range (from Atom up to i7). That part would hit 512GFlop/s peak. Then it's just a guessing game of what clock-speed they could ramp it up to within that 300W TDP, 2Ghz? 3?

    The real killer could be how much sustained throughput can be achieved on an x86 derivative. The Core-2 sustained throughputs were mental, but it used every OoO trick that Intel could throw at it. Without that advantage the peak:sustained ratio will be closer to AMD/Nvidia's current offerings.

    --
    Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
  26. Excuse the Serenity reference... by chrysrobyn · · Score: 5, Funny

    Article: "Things are going to get interesting."

    nVidia: "Define interesting."

    AMD: "Oh God, oh God, we're all gonna die?"

  27. Fuck Parralization by Anonymous Coward · · Score: 0

    That's why you buy... 2 COMPUTERS! Where's the single string performance! Fuck more! Smaller, faster, Intel get on it!

  28. isa by saiha · · Score: 1

    wtf does the international school of amsterdam have to do with this?

  29. Re:If Intel are smart they will mix Core and Larab by smalfries · · Score: 0, Flamebait

    To answer my own post, I'm not entirely sure about some of the facts. Like,... if I like the musty smell of men, does that mean I'm gay?

    --
    Chase the moment. Where does it go? Exactly.
  30. It is more important by DogAlmity · · Score: 5, Insightful

    I'm gonna go ahead and agree with management that maintainability is more important than any other factor. Having had to maintain a few ancient codebases is my day, I've seen way too many "clever" coders that do ridiculous tricks to save time or space. Well designed (read: maintainable) code does not imply any significant performance hit.

  31. Re:If Intel are smart they will mix Core and Larab by dr_wheel · · Score: 1

    Like,... if I like the musty smell of men, does that mean I'm gay?

    Either that, or you're French.

  32. .... then a miracle occurs .... by gnetwerker · · Score: 1
    Saying "given software that can parallelize across many such cores" is the same as saying "then a miracle occurs".

    Unless you are interested in a pretty small class of problems, the inherent parallelism of most applications continues to be somewhere in the range 2.1 to 2.5 (i.e., you can speed them up by a little over 2x with the addition of more processors). Thus, in most real-world applications, most of those cores, or vector units, or any other "supercomputer" features will go unused.

    If anyone here observes a quad-core chip running any particular load anywhere close to 4x the speed of a single core should write a paper about it, because this has been the holy grail of parallel computing for going on 40 years now.

    That Intel thinks this is a solution is sadly typical -- the problem is a software one, not a hardware problem, and they do not know how to solve it.

    1. Re:.... then a miracle occurs .... by Courageous · · Score: 1

      Well, virtualization: it's the driving force behind enterprise adoption of multicore technology today. Companies are eating down all the cores they can get. The appetite is so voracious that memory busses are well and truly stressed. Worse, no one really has any serious technological proposal to solving the memory b/w problem as we get to 16 cores or so.

      C//

    2. Re:.... then a miracle occurs .... by Macman408 · · Score: 1

      I did a small project a few years ago with multi-core and multi-processor execution. The workload I used was a variety of parallel sorting algorithms. The fastest machines I had access to at the time were a dual-processor dual-core machine, and a dual-processor quad-core. That's 4 or 8 cores total, with two or four L2 caches (one for each pair of cores). I got close to perfect speedups when running on 2 cores; about 1.95x to 2x the speed of running on a single core. Going to 4 cores didn't give nearly as much speedup (about 3 to 3.5x the speed of a single core running), but that's mainly due to the data sharing between threads; since there were separate L2 caches, there's a significant latency hit any time data is moved between cores that don't share the cache.

      Now by increasing the size of the dataset, the amount of work to be done increases faster than the amount of communication, so the speedup increases (that's why I gave a range of 3 to 3.5x). My datasets weren't huge - maybe 150 MB or so, I think, small enough that they could finish in under a second. Make the amount of data huge, and the speedup will get even closer to 4x on 4 cores. (Of course, running on all 8 cores of the 8-core machine was even less ideal - a speedup of 4 to 5x. But that incurs the extra penalty of storing data back to RAM when it has to move data between the different chips.)

      More modern machines address these communication problems better (for example, by connecting the processors to each other, in addition to connecting them through the northbridge; or by having cache shared by all cores on a single chip), and will do even better.

      The main problem, as you mention, is really software. However, you don't need parallel software to take advantage of many cores - running different programs helps as well. I wish I had more than 2 cores on my 4-year-old desktop, as running a video chat (with all the inherent encoding and decoding) at the same time as various other tasks (say browsing on a flash-heavy site) can be quite slow. On the other hand, now that dual-core machines are fairly mainstream, more applications will take advantage of multiple cores. And I can't think of a whole lot of applications that are both CPU-bound AND are inherently limited to 2x parallelism. Progress just keeps marching along...

  33. Larrabee = Intel & Cray's love child by Anonymous Coward · · Score: 1, Insightful

    As a former Cray employee I find it interesting to see that Intel's previously unannounced deal with Cray is finally starting to deliver the goods. Intel should just get it over with and buy Cray. They've wanted back into the supercomputer business for while now anyway.

  34. Larrabee's success will depend on... by dadad · · Score: 1

    Larrabee's support for Direct3D and OpenGL will determine its course of life. The reason is twofold. First, whether Larrabee-only games and applications (those that brig out the full functionality of Larrabee) get written will depend on its popularity. Second, for Larrabee to be adopted, it needs to support existing libraries like Direct3D and OpenGL. 1) The first part should be fairly obvious - it takes time to make a game or an application, so developers need to be invested to make one for Larrabee, which will not come to them until Larrabee has some share of the graphics-hardware market. 2) The second part is true because Larrabee-only games and applications do not exist yet. Consumers will not spend 200-300 dollars on a piece of hardware only because of its specifications. They will need to see its applications to buy it. And since no application with Larrabee in mind has been written yet, it will need to support existing applications, and they all use existing libraries such as Direct3D and OpenGL. Thus, Larrabee's success depends on its support for Direct3D and OpenGL, and Intel developers developing drivers for Larrabee will be responsible for it.

    1. Re:Larrabee's success will depend on... by Courageous · · Score: 1

      Well, I'm pretty sure that Intel envisions DirectX being the driver for consumers, yes. However they are looking at GPGPU as a threat, and want to own that space lest it take off.

      C//

  35. Re:If Intel are smart they will mix Core and Larab by smallfries · · Score: 1

    Awww how sweet. Henk has registered a name troll just for me. Poor guy, that's a lot of issues for such a sweet child to carry around.

    --
    Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
  36. Not when you're working for a living by tepples · · Score: 1

    You don't have to use separate functions in C. C does have GOTO's, thank you very much :)

    The subset of C enforced by many employers' coding standards lacks the goto keyword.

    I'd be happy if someone could tell me a better way

    If you can't get your boss to amend the coding standards to allow use of goto to handle exceptions in C, the better way involves leaving your employer. But that isn't practical in this recession.

  37. Translation, help we are stuck! by miffo.swe · · Score: 1

    I read all this multicore sales pitch as just whining about not being able to deliver faster cores in todays CPUs. Having a couple of cores at hand is nice on a deskop. Having four on a server is nice. But, most workloads arent easily ran on multiple cores. Virtualization wont have that much help from a 16 core chip since the I/O subsystem in a normal server will be long overused before the you have stressed an 8-way CPU to the max in most cases.

    What we need is faster CPU-cores, not more of them. Since neither Intel nor AMD can deliver that they are trying their best making people believe what they really need is more cores and that the software people are the ones who has hit the wall. Its just an intricate blame-game where the real issue is that Mores law has slammed into a concrete wall in 200Mph. The upgrade threadmill is on the verge of slowing down and we cant have that can we?

    --
    HTTP/1.1 400
    1. Re:Translation, help we are stuck! by David+Greene · · Score: 1

      Clock speeds are probably not going to get significantly higher unless we have a radical new fabrication technology. I'm not ruling that out because it's happened before, but we're really getting to some physical limits with circuitry.

      Given that, there are really two ways to speed up a single thread of execution and they go hand-in-hand. First you do some architecting to provide a better platform to program on. You do that with microarchitecture and ISA. Larrabee delivers the ISA part. IF compilers can use it well, it could provide significant speedup on some codes. Which brings us to the second way to speed up single thread performance: use the hardware more efficiently. Compilers and other tools are going to be key to future performance and programmers are going to have to get smarter about what they're doing.

      I think it's a rather exciting time because the game is changing.

      --

  38. Maybe now APL, J or K will gain some traction by a-zA-Z0-9$_.+!*'(),x · · Score: 1

    APL is the original matrix computing language, since morphed into J and K. Why handle just one number/character at a time? tOM

    --
    Epitaph: At last! Root access!
  39. It's All Hype by Nom+du+Keyboard · · Score: 1

    Until the hardware shows up at independent review sites and lives up to the rather over-the-top claims, this is all hype and FUD. As long as your current GPU provides 60fps on the games you want to play at your monitor's resolution, everything else means nothing.

    --
    "It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
  40. Re:If Intel are smart they will mix Core and Larab by renoX · · Score: 1

    >If Intel are smart they will release a chip containing one core (or 2 cores) from some kind of lower-power Core design and a pile of Larabee cores on the one die along with a memory controler

    *Ahem*, what about memory bandwith??
    One strong point of GPU is that they have big memory bandwith at a cheap cost as they use a fixed memory setup, if you put both the CPU and the GPU under the same memory controler with replaceable memory then it's quite likely that the GPU will suffer from the lack of memory bandwith.

    I heard that Intel acquired some PowerVR IP, probably because tile-based rendering is set to use less memory bandwith than normal rendering, that said PowerVR's videocards were also less powerful than their competition..

  41. GGP was not "Funny" by tepples · · Score: 1

    Had MichaelSmith's post actually been modded Funny in those two and a half hours, I might have kept my post to 140 characters or less. But in some cases, goto increases speed (due to fewer variables spilled to the stack) without diminishing maintainability, and coding standards that exclude all uses of goto based on a misinterpretation of a 1968 article by Edsger Dijkstra are one of my pet peeves.

    1. Re:GGP was not "Funny" by Blakey+Rat · · Score: 1

      Jesus shit, you're doing it again even KNOWING it's a joke.

      It's a JOKE! A JOOOOOKE!

      You must be a blast at parties, carefully explaining how your momma isn't in fact fat, she's actually average for her age and height considering the local dietary traditions of her homeland...

  42. Re:What? Is 15GB that much for a base OS install? by symbolset · · Score: 1

    Retired? Really? Not a week ago. He may have given the CEO reins over to our favorite chair tosser, but he's still Chairman of Microsoft. No doubt his stock option package is quite good.

    That's good for Microsoft, too. Three nines of companies don't long survive the loss of their founders. As Damon Runyon said, "The race may not always be to the swift, nor the battle to the strong, but that's the way to bet".

    The fall may have even begun before he retired as CEO. When SCO's backstop with Baystar dried up, Microsoft lost all of its credibility in the smoke filled rooms where the real money makes deals. Who knows how much this cost RBC and the other partners? Gates will spend the rest of his life trying to make amends, but those who suffered will never forget. You can't swing a billion dollars without somebody dies, and the dead stay dead no matter how many soup kitchens you volunteer in afterward.

    Eventually, pigeons come home to roost. The devil will have his due.

    --
    Help stamp out iliturcy.
  43. generally your right, unless its needed by cheekyboy · · Score: 1

    95% of code doesnt need to be top notch in speed, but if your code otherwise is utter god slow, ie, do one operation take 12 minutes where another app takes 2 seconds, then you have some serious shit coders.

    Also if your optimization/geewiz app takes 29hrs to run 24hrs of data, then its obviously not going to get any customers.

    Point is, make it fast where it matters, not everywhere. Just look at itunes, is that written in javascript or quicktime scripting?

    Document and document well if your doing tricky speed optimizations, or keep both slow and fast code together with a flag to run X or Y.

    --
    Liberty freedom are no1, not dicks in suits.
  44. Re:If Intel are smart they will mix Core and Larab by Anonymous Coward · · Score: 0

    Surely they would just keep the instructions and translate them to run on Larrabee instead.

  45. Silly by Anonymous Coward · · Score: 0

    Does anyone else see the problem with this statement?

    > [G]iven software that can parallelize across many such cores, performance can scale nearly linearly as more and more cores get packed onto chips in the future.

    Perhaps this will help:

    Given software that can catch a leprechaun, I will be rich in the future.