Slashdot Mirror


Linus Has Harsh Words For Itanium

Anonymous Coward writes "As a follow up to the earlier story "Intel: No Rush to 64-bit Desktop"... In words that Intel are likely to be far from happy with, the Finnish luminary has stuck the boot into Itanium. His responses to some questions on processor architecture are sure to be music to AMD's ears. Linus, in an Inquirer interview concludes: "Code size matters. Price matters. Real world matters. And ia-64... falls flat on its face on ALL of these."" Of course, Linus works for a chip maker ;)

49 of 787 comments (clear)

  1. ... AND FOR THE RECORD by YOU+ARE+SO+FIRED! · · Score: 3, Informative

    This is from the Linux-Kernel mailing list, not an Inquirer interview. Here is the post.

  2. Re:Obsessive by leviramsey · · Score: 2, Informative

    Seeing as Intel was (still is?) a major backer of Red Hat, I'd imagine that Red Hat's kernel hackers have already ported it and will see to it that support for Itanium makes it into the release kernel.

  3. Just a little reminder by NedTheNerd · · Score: 2, Informative

    the fact that linus works for a chip maker doesnt really matter because he dosn't develop the chips. he gets paid there to develop the linux kernel.

  4. Transmeta and x86-64 by neuroslime · · Score: 2, Informative

    "It's worth noting that Torvalds' employer, Transmeta, has licensed x86-64 so he is likely to have access to Hammer hardware." This sounds really interesting. Any ideas what it means?

  5. Re:All in one by gearheadsmp · · Score: 2, Informative

    Well, you kind of forgot how MS server 2003 doesn't have support for x86-64, but has support for Itanium II. Can't leave MS out of the fray ;)

  6. Re:Obsessive by petong · · Score: 3, Informative
    Red Hat already has an ia-64 release that costs like $700.

    But you can always download Debian for the ia-64 architecture for free...

    --
    Libation.com - Fine wine and beer

  7. Re:When is the... by sweetooth · · Score: 3, Informative

    If I recall correctly the Crusoe processor is 128bit . It is simply executing 32bit code through "code morphing"

  8. Re:the reason the Itanic is a bomb.. by umofomia · · Score: 2, Informative
    Trouble is Windows runs on i386 and uh uh uh, it runs on i386 and uh uh uh. Well, it only runs on souped up versions of the 80386, and I'll bet it'll never run on anything else.
    Completely untrue. Microsoft has builds of Windows in ia64 and even AMD's 64-bit architecture. They certainly will be released with Windows Server 2003.
  9. This is probably a troll, but... by tunah · · Score: 4, Informative
    Slang:

    Mickey-mouse == poor quality, inconsistent

    Outfit == organization, company.

    --
    Free Java games for your phone: Tontie, Sokoban
  10. Re:Obsessive by dildatron · · Score: 3, Informative

    Mandrake also has an ia64 port of their distrobution. I just installed it a few weeks ago (version 8.1 I believe).

    RedHat does as well, but their installer would lock up at the end of the install every time, with no errors in the install log. I installed Mandrake after I could not get RedHat to install.

    This was on a first generation (lion) itanium.

    --


    If you had nuts on your chin, would they be chin nuts?
  11. Re:Itanium 2 is great by Billly+Gates · · Score: 3, Informative
    "Itanium 2 is a great architecture/....


    What the hell are you smoking? I want some.

    Every risc archeticture with the exception of the sparc3 performs better. Especially IBM's power4 and the upcomming power5.

    Also there is more then speed when comparing architectures. Itanium is a terrible platform to write compilers for. Alot of optimizations which are tradionally done in the chip at runtime itself must be set by compiler options. Not all of it can be done efficiently like this.

    Speedwise Alpha is getting old now but still is the fastest chip around untill the power5 comes out this fall. For coding and optimization, Mips is the best cpu around.

  12. Re:How to improve x86 by _typo · · Score: 4, Informative
    but I think that makeing them ALL GP (even the older ones) would be good, and maybe bring up the number of registers to a good round 32 or something. Am I missing something glaring wrong?

    Well, the only reason why the other registers aren't GP on x86 is that there are instructions that use them implicitly. If you don't care about these instructions you can use them as regular registers.

    As an example the EDI register is used by the SCAS* instructions as a pointer to memory. If you don't care about the instructions that use this register like that you're free to do regular operations on the EDI register, it has no limitations on what you can do with it.

    You're right to say that there are few registers though. Before I learned x86 I learned MIPS and there you got all the glory of 28+ GP registers. In the simple examples we did I never needed to push and pop from the stack.

    --

    Pedro Côrte-Real.

  13. Re:Code size? by hpa · · Score: 5, Informative

    Code size matters because *cache* isn't cheap. Worse, you can't make L1 cache arbitrarily fast without slowing down your chip big time.

  14. Re:Code size? by ChadN · · Score: 2, Informative

    Number 2 (make cache bigger) is easier said than done, and works against number 1 (cost).

    --
    "It's overkill, of course. But you can never have too much overkill." - Anonymous Slashdot Coward
  15. Itanium2 is the fastest floating-point processor by mrm677 · · Score: 4, Informative

    Check the latest SPEC CPU benchmarks. The Itanium2 has the fastest floating-point score and is no slouch in the integer tests either. It will improve. Linus will eat his words in a few years.

  16. Re:the benifits of 64bit processors? by questionlp · · Score: 2, Informative

    There are 32-bit x86 processors that can handle more than 32-bit memory addressing, the Intel Xeon processors come to mind (which can address up to 36-bits)... the only problem is that the application and OS needs to support windowing or PAE (Physical Address Extension) to allow use of > 4GB of memory.

    The only problem with windowing and use of PAE is that there is a long delay (from the processor's point of view) to shift the window compared to accessing something within that window. On the bright side, the delay isn't nowhere as bad as having to go to virtual memory and paging files.

  17. Re:AMD by Xerithane · · Score: 2, Informative

    [AMD] Was recently considering leaving the CPU business altogether

    Uh.. what? AMD can't leave the CPU business. That would leave them with.. Flash memory. We all know how much revenue that brings in for them.

    You have any links to support this claim?

    --
    Dacels Jewelers can't be trusted.
  18. Re:AMD by Anonvmous+Coward · · Score: 2, Informative

    AMD Was recently considering leaving the CPU business altogether."

    Um when was that? The only thing I recall was a Slashdot article with a misleading headline...

  19. Inquirer does not do the post justice by NullProg · · Score: 4, Informative

    The read from theinquirer.net is all wrong. The slashdot story line is also wrong. It does not state at all what it implies. Here is the link to what Linus actually wrote:

    http://www.ussg.iu.edu/hypermail/linux/kernel/03 02 .2/1909.html

    Now, I agree with Linus on the PPC MMU issue. Can anyone tell me what he means by "baroque instruction encoding"? I have been doing x86 and 68k assembler for a long time, I have never heard of this.

    Enjoy,

    --
    It's just the normal noises in here.
    1. Re:Inquirer does not do the post justice by wmshub · · Score: 2, Informative
      Can anyone tell me what he means by "baroque instruction encoding"?

      "Baroque" in this case means "overly complicated, usually in a bizarre way." If you ever wrote an x86 assembler or disassembler, you would know exactly what Linus was talking about.

    2. Re:Inquirer does not do the post justice by cbiffle · · Score: 5, Informative

      Probably (IANLT) he's referring to the various prefix encodings and variations for instructions. From my x86 manual, "Machine language instructions...vary in length from 1 to as many as 13 bytes. ...There are over 20,000 variations."

      Now, granted, that rather large number probably includes different target registers, but compared to (to use your example) the 68k, the x86's encoding format is just -weird-:
      16-Bit:
      -An opcode. Either 1 or 2 bytes.
      -Some flags and/or target register. 1 byte, optional
      -Displacement. 0-2 bytes.
      -Immediate. 0-2 bytes.

      32-Bit:
      -Optional address size prefix byte.
      -Optional operand size prefix byte.
      -Like above, but with 0-4-byte displacement/immediate and optional scaled index byte.

      Now consider the fact that many opcodes implicitly reference registers. Decoding this instruction set by hand would be a royal bitch, and it's exactly the sort of configuration that RISC targeted for demise.
      However, Linus makes a good point in the e-mail, which is (paraphrased) that the x86 encoding is basically a very good compression algorithm for its code. While the RISC machines that use 32 or 64-bits for every instruction may be more regular, their code does tend to be larger.

      The ironic thing, in my mind, is that the IA-64's encoding is in many ways -more- baroque than the x86s! Instructions in bundles, bundles in groups (or is it the other way around? I never remember), flags at the end to specify how to interpret the instructions before -- it's an interesting take on VLIW, in that it doesn't specify the number of execution units, but YUCK. :-)

  20. Re:How to improve x86 by be-fan · · Score: 4, Informative

    What you're missing is that x86 chips have a ginormous amount of internal rename registers (128 in a P4). The bump to 16 *visible* registers in the Athlon-64 is to allow the compiler optimizer to give more information to the CPU about variable usage. I'm guessing that AMD found that more than 16 visible GPRs really didn't help the compiler's allocation routines any.

    --
    A deep unwavering belief is a sure sign you're missing something...
  21. which architectures? by Anonymous Coward · · Score: 2, Informative

    MIPS is behind Itanium in performance. HP-PA is behind Itanium in performance. SPARC3 is behind Itanium in performance. SPARC64V is behind Itanium in performance. Alpha has higher specint but lower specfp. Power has higher specint but lower specfp. Both major current IA32 processors have higher specint, but they are slaughtered on specfp.

    That's without even mentioning TPC or Java benchmarks which make Itanium look just as good or better.

  22. Re:Linus too Harsh by penguinboy · · Score: 3, Informative

    While the systems can use 4GB of RAM, applications can't have the entire 4GB. RAM must be split into two segments - OS and Apps - usually a 2/2 or 1/3 split.

  23. Re:Itanium2 is the fastest floating-point processo by mrm677 · · Score: 2, Informative

    The SPEC benchmarks are real-world. That's the point of them, and they've been used over the last 10 years to judge the real performance of a processor.

  24. Re:Most home users run Windows. by captredballs · · Score: 2, Informative


    Don't forget that Itaniums are clocked far lower than P4's. The difference is that Intel doesn't plan on marketing 64bit chips to the consumer for a couple years, while AMD has their sights set earlier due to the expected lifespan on the Athlon-family and that their future is bet on 64bits.

    I guess the main thing to note is that the P4 will be around for a least two years longer, where you can't say the same thing about Athlon family, at least at the high end.

    Also coming into the picture is that Apple may have 64 bit workstations in ~ a year.

    --

    I suppose I'm not too threatening, presently, but wait till I start Nautilus
  25. Uh... it's already there by Anonymous Coward · · Score: 2, Informative

    ia64 is in the mainline kernel. At least Debian and Red Hat have released, stable distributions for it. Red Hat even sells support for it.

    ia64 is "in there" as much as alpha and sparc, even if it isn't quite as well tested.

  26. Re:the reason the Itanic is a bomb.. by wideBlueSkies · · Score: 3, Informative

    NT was built on the i860 first, then ported to the i386 arch. More accurately, MS engineers emulated the i860 untill the chip was ready.

    MS did this to make their new OS more or less platform independant. They didn't want to get 'stuck' on the x86.

    Slashdot story here . Article here .

    --
    Huh?
  27. TPC would say differently by Yankovic · · Score: 2, Informative

    The second highest rated TPC box in the world is running Itaniums...

    http://www.tpc.org/tpcc/results/tpcc_perf_result s. asp?resulttype=noncluster

  28. Re:Linus too Harsh by Dynedain · · Score: 2, Informative

    guess what? CAD users have been the driving force in high-end workstations for quite a while now....current machines still aren't sufficient enough to do near-photorealistic design in real time....and they won't be for anytime soon. Untill then, this niche market (if I'm an anomoly, why is Autodesk such a huge developer and Microsoft's biggest supporter?) will continue demanding better

    --
    I'm out of my mind right now, but feel free to leave a message.....
  29. Re:the benifits of 64bit processors? by silas_moeckel · · Score: 2, Informative

    Hrm I'm running at least 2 on a workstation (4 512 meg sticks) and as much as I can cram into a server (PIII servers have 1 gig chips cheap enough) Cmon a modern video card has 128 megs of ram on it with the exception of RamBus ram is cheap comparitivly. Run under Linux and Ram can be one of the largest speed ups out there I runn about a 1 gig used memory heap on my workstation and another 2 gigs that ends up beind drive cache for a mid size scsi raid and that cache makes all the difference in the long run.

    --
    No sir I dont like it.
  30. Re:the return of "worse is better" by Anonymous Coward · · Score: 1, Informative

    "lay waste" means precisely the opposite of "lie in ruins" -- it means "inflict devastation" / "inflict devastation upon".

    The second rendering is because you can use the phrase transitively: "Caesar lay waste the village". (Not lay waste upon or lay waste to.)

    I just thought I'd point that out. Your sentence as quoted doesn't make sense.

  31. A Slashdot Sin... by SleezyG · · Score: 3, Informative

    I know it's not very nerd-like to say that Linus is wrong and that AMD sucks, but in the case of the Itanium, that is exactly how I feel. Intel/HP's Itanium architecture is perhaps the most advanced processor to hit the market and has tremendous potential (from a Computer Architecture point of view). Because it's so new, its performance will be aweful, but shall improve with time. Anyone remember the SuperSparc? It performed horribly and was soon replaced by the UltraSparc. As will the Itanium II replace the Itanium.

    As for the emulation/legacy code argument, I say screw it. gcc is already ported to IA-64. And as a Linux user, most of my favorite open source programs can be ported with little difficulty.

  32. Re:Itanium 2 is great by redgren · · Score: 2, Informative

    They have so many pins because it is not a single cpu. It is an MCM (multi-chip-module). Each Processor "Brick" contains up to 8 CPU cores.

    Current draw is around 250-300A for an MCM. Alot? Hell yeah, but your average athlon XP pulls about 35A. 8 x 35 = 280.

    So, not so big a difference.

  33. Re:Linus holding on to his security blanket? by squiggleslash · · Score: 5, Informative
    I just don't get RISC chips. Why they want to remove things that make programming easier is beyond me.
    It's not stuff that "makes programming easier" that gets removed in RISC, its the more complex try-to-do-everything instructions which are a pain to implement in silicon and which ultimately can just as easily be done via a sequence of simpler instructions that may well, for a programmer who's actually programming at that level, be easier to understand.

    Early on the chief advantage of the approach was that you could use the freed silicon for things like extra registers, and that's exactly the approach taken by Acorn (now ARM) and the PowerPC range. Would you prefer to have eight registers and a single byte copy-block instruction, or 64 registers and have to replace that copy-block instruction with (*gasp*) three simpler instructions?

    (Actually, I guess that depends on how good your cache is. There's no such thing as a free lunch)

    --
    You are not alone. This is not normal. None of this is normal.
  34. Re:But it's still a year away, isn't it? by KewlPC · · Score: 4, Informative

    64-bit code needs 8 bytes to hold every pointer. This will serve to eat up more cache and memory bandwidth, which are already major bottlenecks for any CPU.

    The only thing this eats up is cache; because the system has a correspondingly wider data bus, there isn't a hit in memory bandwidth (unless the designers are trying to be cheap bastards and give a 64-bit CPU the same data bus width you'd use for a 32-bit CPU). And most 64-bit CPUs have a lot of cache.

    And as for what kind of applications you potentially need several gigabytes worth of memory for, there's scientific processing and the like.

  35. Re:But it's still a year away, isn't it? by Waffle+Iron · · Score: 4, Informative
    The only thing this eats up is cache; because the system has a correspondingly wider data bus, there isn't a hit in memory bandwidth (unless the designers are trying to be cheap bastards and give a 64-bit CPU the same data bus width you'd use for a 32-bit CPU)

    Ever since the 8086/8088 duo, the bus width of a CPU has been decoupled from its word size. For a long time, the external bus width of (non Rambus) 32-bit CPUs has been wider than 32 bits. This works because the memory unit fetches entire cache lines. The CPU designers could be less cheap bastards today and bring out 32-bit CPUs with 256-bit wide busses if they wanted to.

    And most 64-bit CPUs have a lot of cache.

    You could put a lot of cache in a 32-bit CPU. You could put a small cache in a 64-bit CPU. In fact, the biggest difference between high-end and low-end CPUs is just the size of their caches.

    To be fair, the current Itanium has an enormous cache that uses the vast majority of the die size and dicates its price and power consumption. It's logic core really isn't that big. If you embedded an X86 core in all of that cache, you'd get a very fast chip. If you teamed up an Itanium core in a Celeron cache, you'd get Celeron-level performance. 64 bits has little to do with it; you're mostly paying for cache and bandwidth when you buy high end CPUs.

  36. Linus is too young to remember by imnoteddy · · Score: 3, Informative
    From the article:
    Torvalds wrote that Intel had made the same mistakes "that everybody else did 15 years ago"
    when RISC architecture was first appearing.

    RISC first showed up on the commercial radar screen almost twenty years when MIPS Computer Systems
    was formed. But people at Stanford (and Berkeley, IIRC) had been publishing papers about
    RISC for four or five years before that, and people at IBM were working on it even before that.

    And the CDC 6600 was a RISC machine in the 1960s. If you don't believe me, ask Cray's Chief Scientist Burton Smith.

    In seeking the unattainable, simplicity only gets in the way. -- Alan Perlis

    --
    No electrons were harmed creating this post, though some may have been subjected to electrical and/or magnetic fields.
  37. Re:How to improve x86 by PeterM+from+Berkeley · · Score: 3, Informative

    I attended an information session by someone from AMD at UCB. It was my understanding from his presentation that the tricks they were using to get up to 16 registers without compromising the ability to run existing 32-bit code made it impossible to get past 16 registers.

    They would've liked to have 32 registers, but it simply couldn't be done in a backward-compatible way.

    If you want more information on this, and more than a guess, AMD has much information up on its website.

  38. Uggh....what does really matter? by zerofoo · · Score: 2, Informative

    Who is Itanium good for? Who is G4 or Power4 good for? What is X86 good for?

    That's like asking what is a saw, hammer and screwdriver good for...they each have an application.

    All these architectures have their good points and bad points. I've written sparc and x86 assembler and I can't say that they are better or worse than each other....just different.

    At this point the hardware is MOOT. Unless algorithms get significantly better soon, the hardware won't matter. Sure, we'll get mega memory address space with any 64-bit architecture, but what does that get you? More memory address space? Big deal...so you've got big memory space...that won't make NP=P any time soon.

    -ted

  39. Re:But it's still a year away, isn't it? by EvilTwinSkippy · · Score: 4, Informative
    News flash...

    The pentium architecture has been loading 64 bits of memory at a time since the PII. They have to because that is the only way the RAM has a chance in hell of keeping up with the processor. Basically they load 2 instructions at once, and have them execute at double the speed of the RAM. (That's also part of why you get such a kick in the pants when you optimize with the -mcpu=i686 flag in gcc.)

    --
    "Learning is not compulsory... neither is survival."
    --Dr.W.Edwards Deming
  40. Re:Obsessive by JWSmythe · · Score: 4, Informative

    It looks like all the big Linux distributions have gotten together to support the IA-64 Linux development.. This was the first hit on search "Linux IA-64" on google.


    http://www.linuxia64.org/

    Working distributions date back from 03/2000

    Straight from their page:

    IA-64 Linux Distributions
    # Caldera Systems (initial release 8/4/00) Download at ftp.caldera.com/pub/OpenLinux64
    # Debian (initial release 8/10/01) Download at www.debian.org/ports/ia64
    # Red Hat (initial release 5/17/00) Download at ftp.redhat.com/pub/redhat/ia64
    # SuSE (initial release 6/13/00) Download at ftp.suse.com/pub/suse/ia64
    # TurboLinux (initial release 3/13/00) Download at www.turbolinux.com/ia64.html

    Their short list of representative companies include: Caldera Systems, CERN, Debian, Hewlett Packard, IBM, Intel, Linuxcare, NEC, Red Hat, SGI, SuSE, TurboLinux, and VA Linux Systems.

    If you search their site, you'll see a few emails from Linus in their mailing list archives, so he's obviously involved at least to a degree (I couldn't imagine him not being involved). I dare say he's educated in the matter, and would know all the in's and out's of say putting together an OS. :)

    I'm sure support will be included eventually.. Well, maybe not.. I know Linux will run on SGI, DEC Alpha, ARM (I'm running Linux on a Compaq iPaq with an ARM CPU), so maybe they'll leave it as a patch and let folks do seperate distributions.

    I guess it's all in how widely used a processor is.. Not the average Joe has an SGI, Alpha, or Itanium at their house. (I'll keep quiet about the 150Mhz SGI Indy that we use as a doorstop).

    --
    Serious? Seriousness is well above my pay grade.
  41. Re:Linus too Harsh by puetzk · · Score: 2, Informative

    if you want to allow fast syscalls (trust me, you do) you need to keep the kernel mapped permanently to cheapen the context switch from app to kernel. You also probbly want to separate physical memory (mapped into the kernel space directly) and virtual space, so that you can have swap and mmap'ed files. You also probably need to keep some address space to map I/O devices into, and some for DMA buffers (unless you really want to give up DMA to get that last memory). What with all the memory on modern video cards, mapping them (to say nothing of the AGP window) is pretty huge too.

    So, unless you want to rewrite a lot of stuff, and throw performance completely down the toilet, you need most of that 4GB address space for things other than app VM space. The current linux split is 1G/3G (1 gig to map physical ram into the kernel and store kernel data structures), 3G of total app address space into which devices, files, swap, or physical ram pages can get mapped. You can also set linxu up for 2/2 I think (which gives you more physical ram at the expense of what each app can use) or 4G/PAE (which takes the performance hit and separates the app and kernel the apps get all 4G themselves, and the kernel uses PAE to map up to 32G in a separate way). But the performance hit is very significant unless your app uses almost no system calls or I/O (device I/O has to get copied around into lowmem for this case).

    --
    The Matrix is going down for reboot now! Stopping reality: OK. The system is halted.
  42. Re:VAX is definitely the best by hughk · · Score: 2, Informative
    The VAX architecture did end up being subsetted and some of the string instructions were dumped (to be implemented via emulation, or in-line code generated during compilation). However, block moves stayed.

    The main point about the VAX arcitecture is that there was very close liasion between the OS architects and the hardware developers, the result being a secure operating system that worked well with little reources.

    Interestingly enough, VMS did get ported to the Alpha, and some of the OS level MACRO-32 assembler code ended up being compiled for the Alpha. Some of the biggest apps still run on OpenVMS Alpha, and I await with trepidation the port to Itanium.

    --
    See my journal, I write things there
  43. Not quite by Rui+del-Negro · · Score: 2, Informative

    There are two issues here:

    1. There is no difference in the speed it takes to transfer data, because the bus is wider. There is also no difference in the time it takes to process data, because registers are also wider. There is a decrease in cache performance (because addresses take up more space). All other things (CPU design, clock speed, etc.) being equal, this hit would be of about 5%. It would only apply to programs running in 64-bit mode, though (the Hammer can still run in 32-bit mode, and can use 8, 16 and 32-bit pointers even in 64-bit mode, in certain instructions).

    2. AMD's x86-64 Hammer doesn't just increase the register size to 64 bits. It adds several new registers, that can (with minor adjustments in the compilers) give a pretty good speed improvement (I'd say about 10% for the same clock speed, although this will depend a lot on the specific program). It also improves the prefetch and adds SSE2 support (one of the few areas where the P4 has an edge). This should give the Hammer approximately a 20-25% improvement over an Athlon XP at the same clock speed (more, if SSE2 is used).

    RMN
    ~~~

  44. A really good CPU benchmark summary by 3770 · · Score: 3, Informative


    Go here for a really good summary of current CPUs.

    --
    The Internet is full. Go Away!!!
  45. Re:Linus holding on to his security blanket? by TheLink · · Score: 3, Informative

    Ask anyone who has done assembly language programming on x86 and a decent CISC and x86 will always lose out too.

    But the x86 has evolved a lot since the bad old days. You could regard the ugly stuff as vestiges of a primitive form and stick to saner modes.

    A larger code size can be a significant disadvantage nowadays. Imagine CISC as compressed RISC opcodes. The current situation is the CPU is VERY much faster than the RAM or even the 2nd level cache. So it's not a big deal to have to decompress (decode/expand to RISC) instructions in the CPU. You gain overall processing throughput that way.

    As long as that situation remains, larger code size is a significant issue. It means fewer programs in memory.

    True RISC processors you talk about are declining. Most are becoming more pragmatic. Which is what Linus is talking about.

    --
  46. Re:But it's still a year away, isn't it? by Anonymous Coward · · Score: 1, Informative

    L. O. L.

    The "Pentium architecture" is 3 completely separate implementations of the IA32 (32-bit x86) architecture:

    - P5 (Pentium, Pentium MMX)
    - P6 (PentiumPro, Pentium II, Pentium III)
    - P7 (Pentium IV)

    Each generation is as different from the others as 386 was from 486. One thing all the "Pentium" implementations share in common (aside from the catchy trademarkable name) is a 64 bit data bus. "i686" = P6, and optimizing for it only gives you a "kick in the pants" on P6 CPUs. It has little or nothing to do with the bus width of the Pentium chips; it's all about instruction selection and scheduling optimized for the particular (P6) implementation. That crap about loading "2 instructions at once" and "double the speed of RAM" is nonsense. You have to remember that all data come into the CPU through the caches, which are loaded 32 or more BYTES at a time from memory -- the wider bus just makes cache fills take fewer bus cycles. Alphas (64-bit) similarly had wide busses (128 or even 256 bit) for faster cache fills.

  47. Re:Itanium 2 is great by dr2chase · · Score: 2, Informative

    Itanium's problems were visible from the moment the architecture appeared. It is, and was, an architecture that should excel at running Fortran programs, which are much more easily optimized than code written in C, C++, or Java. Compilers written ten years ago should be able to do a decent job compiling Fortran to Itanium with only a modest amount of porting work. Problem is, people aren't just running Fortran on Itanium.

    The apparently-dynamic nature of current programs (that is, the intractability of statically analyzing them) has been coming for years. Ten years ago I spent my time studying the inner loops of SPEC benchmarks, and even then the typical inner loop of a C program was the instructions:

    compare X with a value
    branch out if equal
    load indirect through Y to get Y'
    load indirect through Y' to get X
    branch to top of loop.

    If Y (and Y', and Y'', etc) don't address memory in cache, you're hosed. Static prediction algorithms used in some of the first RISC chips (HP-PA, e.g.) work as well as any other on this loop, but you don't know that you're done until you load all the data and compare it. The loop cannot run any faster per iteration than the latency of the memory that happens to hold the data (Cache is King).

    Object oriented programming, whether accomplished with an OO-TM programming language, or just a structure full of function pointers, is about the same can of worms (internally, the processor is caching the last location of the indirect branch, so it is not substantially different from prediction of conditional branches).