Slashdot Mirror


Intel Unveils Next Gen Itanium Processor

MojoKid writes "This week, at ISSCC Intel unveiled its next-generation Itanium processor, codenamed Poulson. This new design is easily the most significant update to Itanium Intel has ever built and could upset the current balance of power at the highest-end of the server / mainframe market. It may also be the Itanium that fully redeems the brand name and sheds the last vestiges of negativity that have dogged the chip since it launched ten years ago. Poulson incorporates a number of advances in its record-breaking 3.1 Billion transistors. It's socket-compatible with the older Tukwila processors and offers up to eight cores and 54MB of on-die memory."

19 of 169 comments (clear)

  1. His name was Robert Paulson by GoNINzo · · Score: 3, Funny

    Guess the guys at Intel have been watching Fight Club a little too much.

    --
    Gonzo Granzeau
    "Nothing the god of biomechanics wouldn't let you into heaven for.." -Roy Batty
    1. Re:His name was Robert Paulson by fuzzyfuzzyfungus · · Score: 2

      "I understand. In death, we have names..."

      This the itanium team; but could they have chosen something a little less, er, pessimistic?

  2. Itanium flashbacks by ArhcAngel · · Score: 4, Insightful

    Does anyone else cringe when they here Itanium? The early chips still give me nightmares.

    --
    "A person is smart. People are dumb, panicky dangerous animals and you know it." - K
    1. Re:Itanium flashbacks by Anonymous Coward · · Score: 5, Interesting

      I work with the world's foremost experts on optimizing for Itanium 2. All available compilers suck. If you are willing to invest the effort to hand tweek, you can squeeze amazing performance out of the processors. They are extremely memory bound (hence 54MB cache now on chip). It is usually faster to recalculate numerical values than to fetch stored results.

      We work with large high performance computing systems/clusters. IBM Power 7 is fastest hands down for numerical work if you plan to use the crap output from the compiler directly. Recent Intel Xeon is as fast as Power 7 if you adjust all the fiddly settings and use some trial and error, but Xeon doesn't scale well for Symmetric Multi-Processing (SMP). Itanium 2 wins by a bit if you invest huge effort. Power 7 would probably be fastest overall for numerical work if we invested the same effort into optimizing that we do for Itanium. However, we don't have to invest the effort for Power 7 to be "fast enough".

    2. Re:Itanium flashbacks by TheLink · · Score: 3, Interesting

      Would the same optimizations for the Itanium work OK for the Itanium 2 and for the upcoming Itanium? Or would the optimizations be too generation specific?

      AFAIK the problem with the Itanic was the Itanic was better at "embarrassingly parallel" problems. But that meant you could usually get the same (or better) performance with two or more x86 servers at a lower cost... And the x86 processors would do better than the Itanic on code that's not been optimized by super experts.

      --
    3. Re:Itanium flashbacks by mevets · · Score: 4, Funny

      | I work with the world's foremost experts on optimizing for Itanium 2....

      So when your whole team orders lunch, do you get a medium pizza or a large?

    4. Re:Itanium flashbacks by Nikker · · Score: 4, Funny

      You might be under estimating the size of our employee.

      --
      A loop, by its nature, continues. If that didn't make sense, start reading this sentence again.
    5. Re:Itanium flashbacks by Mr+Z · · Score: 2

      The cache has to be backed by *something*. Either that, or you have to have some protocol wherein when you kick the last copy of something out of one cache, you arrange for it to get stored in another, in which case it isn't really a cache at that level so much as a set of dynamically assigned addresses system-wide.

      Consider a smaller system first as an example. Suppose you had two CPUs, each only one level of cache, and each with 1MB. That's 2MB total system memory. Now suppose the first CPU reads through all 2MB twice, bringing all of it through its 1MB cache. It'd have to send the bits it wasn't using to the other CPU's cache while it wasn't stored in its own cache--ie. it's have to exchange data with the other cache as opposed to merely copying it as caches do today. The net effect is that you're just changing the current home for the data, since there's no system RAM outside of these "caches" that serves as the canonical home for the data .

      It's not that such a system couldn't work, but it would probably be impractical for large memories and certain workloads. For example, suppose both processors are reading from the same large database, and the database is largely read-only. In this system, there's no opportunity for both CPUs to simultaneously have a copy of the data that they're both sharing. The data would have to ping back and forth

      You could combine this with more ordinary caches. For example, add another 512K cache to each CPU between the CPU and the 1MB I described above. Now the CPUs could each have copies of read-only stuff and shared stuff, but the final home for the data would still migrate among the CPUs. It'd perform OK until you stopped fitting in the smaller cache. This is roughly what large NUMA machines do with page migration. The operating system has to manage that though to make it work well. Normal virtual memory translation mechanisms allow you to move a virtual page to different physical memories attached to other CPUs without the application knowing. It's like the cache idea you described, but managed by the OS.

      The problem with huge hardware caches is the lookup penalty. For each cache block, you have to keep track of what address maps to the cache block. If you're distributing this through the system, then you need some sort of directory mechanism to find the cache block of interest. This isn't intractable mind you--it's a similar problem to finding a cell phone in the cell phone network, but it is expensive and complicated to get right in hardware. And unlike the cell phone network, it can't just send a call to voice mail when it can't find what it's looking for. That's why we push much of this to software and let the OS handle it at the page level. It can apply more sophisticated algorithms, and crucially, it can be patched when we find bugs.

    6. Re:Itanium flashbacks by Chris+Mattern · · Score: 2

      So it runs like a bat out of hell when you massage it correctly. Good for you. Know what we call a processor that nobody can write a decent software stack for? A shitty processor.

    7. Re:Itanium flashbacks by TheRaven64 · · Score: 2

      I work with the world's foremost experts on optimizing for Itanium 2. All available compilers suck.

      I sometimes work on compilers for HPC, and this is caused by two, related, things. The first one is that no one cares. Itanium is such a small market that, even if you can get both Itanium users to buy your compiler, it's not worth the investment.

      IBM Power 7 is fastest hands down for numerical work if you plan to use the crap output from the compiler directly.

      POWER 7 is a pretty generic RISC design with a few CISCy tweaks. We've got 40 years and millions of dollars of research to look at when designing compilers for it. For Itanium? Not so much. It doesn't help that Itanium is so unlike everything else that it's hard to share optimisations. Optimisation code in the middle end can be shared easily among MIPS, SPARC, and POWER, and mostly with x86, but with Itanium may actually be pessimising the code.

      It's a shame, because I actually like a lot of things about the Itanium, and I'd be quite interested in working on optimisations for it, but no one is willing to pay enough to make it worthwhile.

      However, we don't have to invest the effort for Power 7 to be "fast enough".

      POWER has the same advantage that C has, when it comes to performance: a pretty crappy compiler can give good performance. Take a look at TCC sometime - it's about as primitive as it's possible to be and still be a C compiler, yet still delivers okay (though not great) performance in most cases. Now compare a crappy Java implementation to a good one - the difference is far more pronounced. It's the same with Itanium - the compiler needs to be very clever to get good performance, and for the same reason. C presents the programmer with an abstract machine that is basically a PDP-11, and the POWER 7 is sufficiently similar to a PDP-11 that it's a pretty trivial mapping. Once you've done that, you can tweak for more performance. Java presents an abstract model closer to a B5000, so you first have to map that to something more like a PDP-11, then to the native chip. The Itanium provides a concrete model that is not very like the C abstract model, so compiling C for Itanium is a pain. Something like APL might be easier, but I've not heard of anyone actually using APL for a while.

      --
      I am TheRaven on Soylent News
    8. Re:Itanium flashbacks by MiniMike · · Score: 2

      They order 8 personal pizzas, all to be put in the same box. Only one person at a time is allowed to remove a slice.

    9. Re:Itanium flashbacks by Simon80 · · Score: 2

      I'm not a hardware guy, but what you're saying is out of touch with the design tradeoffs CPU designers already make for the following reason. Memory that runs at CPU speed already exists, that's what registers are. Then you have L1 cache, which takes a few cycles to access, L2, which takes I don't remember, a few dozen cycles, and memory, which takes hundreds of cycles, even longer if the TLB cache is being missed as well. Obviously things go faster if you have more registers, more L1, etc., but it's a tradeoff between cost, opportunity costs, and performance. Even if you had copious amounts of these on-die caches, you'd still want more RAM, so that you'd have something to put into the caches instead of limiting the working set size to the size of the cache! Furthermore, there is a hierarchy of storage that looks something like registers > L1 > L2 > L3 > memory > SSDs > disk storage > LAN > WAN. Of course, disk storage would be behind information stored in memory of another CPU node in a supercomputer, but anyway, the point is that the faster tiers are there to make access to the slower ones seem faster than it would be if the slow tiers were accessed directly. If you want to be able to write fast software, I suggest you read Ulrich Drepper's What Every Programmer Should Know About Memory. It's not that long, and very informative.

  3. Just one thing... by fuzzyfuzzyfungus · · Score: 4, Funny

    Is it more resistant to icebergs than the previous itanics?

    1. Re:Just one thing... by gstoddart · · Score: 4, Funny

      Is it more resistant to icebergs than the previous itanics?

      This one melts right through them.

      --
      Lost at C:>. Found at C.
  4. Ultimate Computer of Failure by pezpunk · · Score: 3, Funny

    ITANIC processor
    RAMBUS memory
    Voodoo5 video card
    i can't think of a hard drive crappy enough ... maybe you could have the OS installed on an external drive connected via USB1.0.

    obviously the OS would be WindowsME.

    --
    i could live a little longer in this prison
    1. Re:Ultimate Computer of Failure by nschubach · · Score: 4, Insightful

      IBM/Hitachi Deskstar AKA: Deathstar

      --
      Every time I start to have faith in humanity, I ruin it by driving to work between 7 and 8 am.
  5. Re:Whytanium? by the+linux+geek · · Score: 5, Informative

    Itanium is the #2 high-end UNIX server processor, ahead of SPARC but behind POWER. Itanium systems get between $4bn and $5bn and sales, and are growing. It didn't meet the original goal of taking over the world, but I don't know what parallel universe you live in to think it's a failure.

  6. Re:Isn't it strange... by Olivier+Galibert · · Score: 3, Insightful

    That's because the high-end server world accepts level of single-core performance the consumer world doesn't. These processors are not something you want on your PC. You want something with better memory management, way faster I/O with ram and GPU, etc. OTOH, you usually don't care about multi-processor.

    But faster I/O usually means putting more things on the die (hence amd's integrated memory controllers, now followed by Intel) and having larger busses/more efficient protocols, and acting on that means changing the socket. And the north bridge, if one is left. And the memory, for a faster one. You wouldn't get enough speedup from changing the cpu alone with everything else pin-compatible to make it worth it.

    Meanwhile, the itanic spends its time waiting for the ram to answer... but since you put a lot of them in the box, in aggregate they can be useful.

        OG.

  7. Re:Marketing at it finest by the+linux+geek · · Score: 2

    Yeah, outside the US, Itanium is big on mainframes. Fujitsu, NEC, Bull, and (I think) Hitachi all run proprietary mainframe OS's on IA64, and at least in their home countries they do a pretty good business.