> I have been known to trade things that are > difficult to find, or so out of date that it's no > longer sellable.
I'm sympathetic to that concern, but I'm guessing it accounts for a small minority of downloads. If you extrapolate this to a license to download anything and everything, then I suspect you're just using it as an excuse to please yourself.
From a moral point of view: people who distribute copyrighted material are violating both the letter and spirit of the law, and deserve to be punished.
From a strategic point of view: The only alternative to punishing copyright violators, short of abandoning copyright altogether, is to make violation impossible through Orwellian DRM backed up by even more Orwellian legislation, or by hamstringing the Internet in some other way. I don't want to lose my freedom and my technology because some punks thought they should be allowed to download music without paying for it.
Yes, they're in the cache, because the cache miss penalty on Itanium is so horrible, so it needs a huge cache to get competitive performance. What's your point?
Maybe your point is that a slow chip with a huge cache is a good design point for massive-scale shared-memory machines. Fine, but it's stupid to design a CPU for that tiny market.
> ia-64 is the most dissimilar, but only because > everyone else is doing exactly the same stuff. > Does the really include any design features not > present in some form in ?
eh?
> x = a->b->c also stumps hardware pre-loading.
Right, but an out of order machine (anything except Itanium) will be able to run ahead executing instructions that don't depend on the value of x, while waiting for the value to come back. Itanium just sits there doing nothing.
> itanium 2 doesn't do next-line prefetching, but > it does read 2 bundles of instructions per > cycle.
I was talking about data prefetching. If it doesn't do next-line code prefetching then I don't know what's going on in Santa Clara, those guys need help.
> Your contention is correct that itanium doesn't > solve all the problems that face a modern risc > architecture. Does that mean that no one should > bother trying?
I'm glad that the IA-64 architects really pushed on some cool ideas, I'll give an A for effort there. But "good effort" is not a reason for customers to buy the chips.
> Should processor makers churn out the same stuff > and wait for moore's law to make things faster? > Hope that multi-core cpus will somehow be better > utilized than smps?
How to use all those transistors is the perennial question of architecture research. All I can tell you is that IA-64 is the wrong answer.
I have a dual SMT Xeon at home which Linux sees as a 4-way box. I get plenty of use out of it. In the truly demanding application domains --- games and servers --- it's not that hard to make use of threaded parallelism.
> Really solving the problem would require some > radical design that completely undermines > current methods of programming.
We're working on it. But it's probably not that dramatic.
> Actually, the companies that create "stringing > vast arrays of processors together" machines > tend to only waste their time on high powered > processors.
Correct, because the cost of the processors is a small fraction of the cost of building such a machine so they're not worried about processor price-performance.
> Itanium processor are very inexpensive > processors compared to them [RISC processors].
Yes. But that's not the comparison I was making.
> x86 do really well at int based benchmarks, but > are enemic on floating point.
Yes, Itanium2 performs well vs x86 on nice regular FP codes like SpecFP. But the server market, which is the real target market for Itanium, is mostly irregular, integer and memory intensive code.
> As a matter of fact, if there was a market for > games on itanium2
Yeah, it's ironic that Itanium could actually make a good games machine! If it wasn't so expensive.
> If you do a comparison on price/transistor, > you're only seeing a $7.8/mill transistor for p4 > vs $9.1/M transistor for Madison 4M based > processors.
That's because the P4 is a pretty bad (transistor-heavy) design too.
Peak flops is a meaningless number. High peak flops means nothing more than you can do "1 + 1 = 2" very very fast. It says nothing about the performance of real code.
HPC is expensive, no doubt about that. It's also very low volume compared to the Web server, application server, datacenter server, file and print server, etc business market.
They're only in a different class because you and Intel say so. Actual customers buy Opterons and Itania to do same sorts of things. (And Athlon64, while it's targeted at a different market, runs the same software and is largely the same internally as Opteron, so AMD gets the volume advantage.)
You're repeating the original press releases from 1999. What we've learned since then (and everyone except Intel and HP knew before) is that predicting branches, load addresses and schedules at compile time, without much runtime knowledge, is far harder than it is for the chip to do it at run time, no matter how smart your compiler is. Much of the time, it's just impossible.
Predication's nice, but it wastes resources when you can predict branches accurately, which you can most of the time. And the big bottleneck is not branch misprediction pipeline flushes (~30 cycles), it's cache misses (100-1000 cycles). That's where Itanium really hurts.
But I know that people will keep talking about the "forward-looking" "greater headroom" IA-64 architecture right up until it gets cancelled.
Predication and explicit speculative loads were primarily added to the IA-64 architecture because they'd decided not to implement out-of-order execution (dynamic scheduling) and other dynamic techniques*. They aren't nearly as important on a modern superscalar processor.
(* Itanium2 doesn't even do next-line prefetching!)
Explicit speculative loads was a major mistake because in many kinds of code the compiler cannot place speculative loads far enough ahead of the actual use for it to pay off. Often the address to be loaded from is simply not known far in advance of the load (consider executing the C code "x = a->b->c"). So Itaniums spend a lot of time stalled waiting on memory accesses. That's why Intel spends so many transistors on gigantic on-chip caches, to try to reduce that pain. The architecture's pretty good for workloads with very regular and compiler-analyzable access patterns (regular number crunching, SpecFP) but it's bad for everything else (servers, user applications, irregular numeric codes).
Yes, IA-64 is a aggressive, radical, clean and somewhat novel design, so it's understandable that some geeks love it. However, it is not a good design.
If it was a good design, then with Intel engineering, 5x the transistor count, and no backward compatibility requirements, it would be absolutely crushing Opteron performance. Instead it is merely competitive.
BTW it is quite odd to consider IA-64 a small tweak over RISC chips. IA-64 is the most dissimilar of all viable architectures today.
Yeah but the Althon64 and the Opteron run the same software and are very similar internally. AMD gets to reuse most of the design work and amortize costs over a lot more processors. So it is reasonable to count them together.
AMD64 and Opteron are nearly the same chip and run the same software. AMD gets to share design and manufacturing costs between them. So shipping 10x-100x more AMD64 chips than IA-64 chips means that AMD's costs will be much lower per chip and the chips will be much cheaper. So it really does make sense to compare the volumes of AMD64 vs IA-64.
> The Intel fortran, C, and C++ compilers for the > Itanium for Windows and Linux are pretty godlike > in my experience.
I'm sure they're good, but they're not good enough.
> Look at AMD benchmarks and usually they are done > with the Intel compiler.
That'd be the x86 compiler, not the Itanium compiler.
> Your definition of sucked differs from mine.
Stringing vast arrays of processors together to build supercomputers tells you almost nothing about the performance of the individual processors.
You'd have done better to quote SpecFP numbers, but see my comment above about relative transistor count. IA-64 just doesn't give much bang for the buck (or transistor). If you strap a jet engine to a pig, sure it'll fly.
> Coke does the same thing over RC cola. Windows > does this over OS X.
Its performance is really bad considering how honking huge the chip is. Itanium2 (Madison) is 500M transistors. Opteron is slightly over 100M. Not to mention the price...
x64's 64-bit mode fixes quite a few of the problems of x86 as well as giving you 64-bit support. For example, a number of useless old instructions are no longer supported (they still work in x86 mode of course). It increases the number of general purpose registers from 8 to 16. Using SSE2 to do floating point, you get a reasonable floating-point instruction set with 16 registers. If you squint a bit it looks like a decent instruction set which just happens to have a weird instruction encoding.
Yes, the decode stages are a pain (though trace cache helps), but in return you get significantly higher instruction density than competing RISC chips which helps with your instruction cache.
OTOH the IA-64 architecture was designed around unfounded implementation assumptions like "we won't be doing out-of-order execution". Sorry, WRONG. Sometimes polishing up old junk gives better results than designing completely new and differently broken junk.
I've heard rumours that Intel wanted to do something radical in the architecture because it would be harder for other vendors (AMD) to clone. That could have forced them into their VLIW design.
When IA-64 was conceived (mid 90s) some research groups (e.g. IMPACT at Illinois) were touting in-order VLIWs with compiler support as the way of the future. Their research had problems but perhaps some key Intel/HP engineers bought into it.
Now imagine that the IA-64 project got rolling and after a few years you've aligned the company around the project and sunk a billion dollars or two into it. Maybe you've even talked it up in the press or with analysts. Many of your best and most senior engineers have staked their careers on the project. Now suppose some of your people have doubts. How hard would it be for them to persuade the company to flush it? Near impossible, I suspect.
It's scary how close we all came to watching AMD go under and IA-64 taking over in spite its inferiority. It would have been a terrible example of monopoly power leading to bad outcomes. Fortunately at this point it's only a matter of time before IA-64 is cancelled. It can't compete with x64 chips which are essentially equivalent but ship in 10x-100x of the volume.
1) When the IA-64 design first became public, it was clear that they'd made some incredibly poor decisions. For example, the architectural design was based on the assumption that the chip would not do out-of-order execution in hardware. Such deficiences were to be remedied by a god-like compiler that would emerge at some later date. Unsurprisingly, it never has.
2) These predictions were borne out by the fact that Itanium performance has always sucked, especially considering the enormous die size, cost and heat dissipation.
3) It looked like Itanium might win in the market despite its technical limitations, just because of Intel's vast marketing budget, its momentum, and its monopoly leverage forcing OEMs to stay away from technically superior alternatives like AMD64.
4) Thankfully this hasn't happened. The technically superior, open solution is winning. Thanks AMD.
> So we can find ourselves in a situation where > one popular browser's (or rending engine) tics > and weirdness dictates how to write webpages > like IE does now?
As a core Gecko developer, I promise you that we are committed to fixing any tics and weirdnesses that deviate from published Web standards, and this will remain true even in the unlikely event we find ourselves with a monopoly. For Web developers, this means that if they rely on bugs of ours that deviate from Web standards, then we will eventually break their content.
Because we're open source, you don't even have to trust me. If you ever feel that Mozilla.org is abusing its position, you are welcome to gather followers, fork the code and carry the project on in whatever direction you wish.
> I have been known to trade things that are
> difficult to find, or so out of date that it's no
> longer sellable.
I'm sympathetic to that concern, but I'm guessing it accounts for a small minority of downloads. If you extrapolate this to a license to download anything and everything, then I suspect you're just using it as an excuse to please yourself.
I'm not familiar with Candian copyright laws, but that sounds pretty messed up.
Even if you're legally OK, morally what you're doing is still equivalent to stealing if it wasn't the publisher's intent to permit such downloading.
Good point.
From a moral point of view: people who distribute copyrighted material are violating both the letter and spirit of the law, and deserve to be punished.
From a strategic point of view: The only alternative to punishing copyright violators, short of abandoning copyright altogether, is to make violation impossible through Orwellian DRM backed up by even more Orwellian legislation, or by hamstringing the Internet in some other way. I don't want to lose my freedom and my technology because some punks thought they should be allowed to download music without paying for it.
Yes, they're in the cache, because the cache miss penalty on Itanium is so horrible, so it needs a huge cache to get competitive performance. What's your point?
Maybe your point is that a slow chip with a huge cache is a good design point for massive-scale shared-memory machines. Fine, but it's stupid to design a CPU for that tiny market.
> ia-64 is the most dissimilar, but only because
> everyone else is doing exactly the same stuff.
> Does the really include any design features not
> present in some form in ?
eh?
> x = a->b->c also stumps hardware pre-loading.
Right, but an out of order machine (anything except Itanium) will be able to run ahead executing instructions that don't depend on the value of x, while waiting for the value to come back. Itanium just sits there doing nothing.
> itanium 2 doesn't do next-line prefetching, but
> it does read 2 bundles of instructions per
> cycle.
I was talking about data prefetching. If it doesn't do next-line code prefetching then I don't know what's going on in Santa Clara, those guys need help.
> Your contention is correct that itanium doesn't
> solve all the problems that face a modern risc
> architecture. Does that mean that no one should
> bother trying?
I'm glad that the IA-64 architects really pushed on some cool ideas, I'll give an A for effort there. But "good effort" is not a reason for customers to buy the chips.
> Should processor makers churn out the same stuff
> and wait for moore's law to make things faster?
> Hope that multi-core cpus will somehow be better
> utilized than smps?
How to use all those transistors is the perennial question of architecture research. All I can tell you is that IA-64 is the wrong answer.
I have a dual SMT Xeon at home which Linux sees as a 4-way box. I get plenty of use out of it. In the truly demanding application domains --- games and servers --- it's not that hard to make use of threaded parallelism.
> Really solving the problem would require some
> radical design that completely undermines
> current methods of programming.
We're working on it. But it's probably not that dramatic.
> Actually, the companies that create "stringing
> vast arrays of processors together" machines
> tend to only waste their time on high powered
> processors.
Correct, because the cost of the processors is a small fraction of the cost of building such a machine so they're not worried about processor price-performance.
> Itanium processor are very inexpensive
> processors compared to them [RISC processors].
Yes. But that's not the comparison I was making.
> x86 do really well at int based benchmarks, but
> are enemic on floating point.
Yes, Itanium2 performs well vs x86 on nice regular FP codes like SpecFP. But the server market, which is the real target market for Itanium, is mostly irregular, integer and memory intensive code.
> As a matter of fact, if there was a market for
> games on itanium2
Yeah, it's ironic that Itanium could actually make a good games machine! If it wasn't so expensive.
> If you do a comparison on price/transistor,
> you're only seeing a $7.8/mill transistor for p4
> vs $9.1/M transistor for Madison 4M based
> processors.
That's because the P4 is a pretty bad (transistor-heavy) design too.
Peak flops is a meaningless number. High peak flops means nothing more than you can do "1 + 1 = 2" very very fast. It says nothing about the performance of real code.
HPC is expensive, no doubt about that. It's also very low volume compared to the Web server, application server, datacenter server, file and print server, etc business market.
They're only in a different class because you and Intel say so. Actual customers buy Opterons and Itania to do same sorts of things. (And Athlon64, while it's targeted at a different market, runs the same software and is largely the same internally as Opteron, so AMD gets the volume advantage.)
You're repeating the original press releases from 1999. What we've learned since then (and everyone except Intel and HP knew before) is that predicting branches, load addresses and schedules at compile time, without much runtime knowledge, is far harder than it is for the chip to do it at run time, no matter how smart your compiler is. Much of the time, it's just impossible.
Predication's nice, but it wastes resources when you can predict branches accurately, which you can most of the time. And the big bottleneck is not branch misprediction pipeline flushes (~30 cycles), it's cache misses (100-1000 cycles). That's where Itanium really hurts.
But I know that people will keep talking about the "forward-looking" "greater headroom" IA-64 architecture right up until it gets cancelled.
Predication and explicit speculative loads were primarily added to the IA-64 architecture because they'd decided not to implement out-of-order execution (dynamic scheduling) and other dynamic techniques*. They aren't nearly as important on a modern superscalar processor.
(* Itanium2 doesn't even do next-line prefetching!)
Explicit speculative loads was a major mistake because in many kinds of code the compiler cannot place speculative loads far enough ahead of the actual use for it to pay off. Often the address to be loaded from is simply not known far in advance of the load (consider executing the C code "x = a->b->c"). So Itaniums spend a lot of time stalled waiting on memory accesses. That's why Intel spends so many transistors on gigantic on-chip caches, to try to reduce that pain. The architecture's pretty good for workloads with very regular and compiler-analyzable access patterns (regular number crunching, SpecFP) but it's bad for everything else (servers, user applications, irregular numeric codes).
Yes, IA-64 is a aggressive, radical, clean and somewhat novel design, so it's understandable that some geeks love it. However, it is not a good design.
If it was a good design, then with Intel engineering, 5x the transistor count, and no backward compatibility requirements, it would be absolutely crushing Opteron performance. Instead it is merely competitive.
BTW it is quite odd to consider IA-64 a small tweak over RISC chips. IA-64 is the most dissimilar of all viable architectures today.
Yeah but the Althon64 and the Opteron run the same software and are very similar internally. AMD gets to reuse most of the design work and amortize costs over a lot more processors. So it is reasonable to count them together.
> Heck, it's not even a
> serve-your-shitty-perl-app-over-the-web"
> processor.
Well that's too bad for Intel, because that's where the money is.
The 64-bit mode in AMD's 64 bit chips actually cleans up the x86 architecture quite a bit. See my comment above.
AMD64 and Opteron are nearly the same chip and run the same software. AMD gets to share design and manufacturing costs between them. So shipping 10x-100x more AMD64 chips than IA-64 chips means that AMD's costs will be much lower per chip and the chips will be much cheaper. So it really does make sense to compare the volumes of AMD64 vs IA-64.
> The Intel fortran, C, and C++ compilers for the
> Itanium for Windows and Linux are pretty godlike
> in my experience.
I'm sure they're good, but they're not good enough.
> Look at AMD benchmarks and usually they are done
> with the Intel compiler.
That'd be the x86 compiler, not the Itanium compiler.
> Your definition of sucked differs from mine.
Stringing vast arrays of processors together to build supercomputers tells you almost nothing about the performance of the individual processors.
You'd have done better to quote SpecFP numbers, but see my comment above about relative transistor count. IA-64 just doesn't give much bang for the buck (or transistor). If you strap a jet engine to a pig, sure it'll fly.
> Coke does the same thing over RC cola. Windows
> does this over OS X.
Yeah, and we don't have to like it.
Its performance is really bad considering how honking huge the chip is. Itanium2 (Madison) is 500M transistors. Opteron is slightly over 100M. Not to mention the price...
x64's 64-bit mode fixes quite a few of the problems of x86 as well as giving you 64-bit support. For example, a number of useless old instructions are no longer supported (they still work in x86 mode of course). It increases the number of general purpose registers from 8 to 16. Using SSE2 to do floating point, you get a reasonable floating-point instruction set with 16 registers. If you squint a bit it looks like a decent instruction set which just happens to have a weird instruction encoding.
Yes, the decode stages are a pain (though trace cache helps), but in return you get significantly higher instruction density than competing RISC chips which helps with your instruction cache.
OTOH the IA-64 architecture was designed around unfounded implementation assumptions like "we won't be doing out-of-order execution". Sorry, WRONG. Sometimes polishing up old junk gives better results than designing completely new and differently broken junk.
I don't believe that the cost for the chips is anywhere near parity.
We can only speculate.
I've heard rumours that Intel wanted to do something radical in the architecture because it would be harder for other vendors (AMD) to clone. That could have forced them into their VLIW design.
When IA-64 was conceived (mid 90s) some research groups (e.g. IMPACT at Illinois) were touting in-order VLIWs with compiler support as the way of the future. Their research had problems but perhaps some key Intel/HP engineers bought into it.
Now imagine that the IA-64 project got rolling and after a few years you've aligned the company around the project and sunk a billion dollars or two into it. Maybe you've even talked it up in the press or with analysts. Many of your best and most senior engineers have staked their careers on the project. Now suppose some of your people have doubts. How hard would it be for them to persuade the company to flush it? Near impossible, I suspect.
It's scary how close we all came to watching AMD go under and IA-64 taking over in spite its inferiority. It would have been a terrible example of monopoly power leading to bad outcomes. Fortunately at this point it's only a matter of time before IA-64 is cancelled. It can't compete with x64 chips which are essentially equivalent but ship in 10x-100x of the volume.
1) When the IA-64 design first became public, it was clear that they'd made some incredibly poor decisions. For example, the architectural design was based on the assumption that the chip would not do out-of-order execution in hardware. Such deficiences were to be remedied by a god-like compiler that would emerge at some later date. Unsurprisingly, it never has.
2) These predictions were borne out by the fact that Itanium performance has always sucked, especially considering the enormous die size, cost and heat dissipation.
3) It looked like Itanium might win in the market despite its technical limitations, just because of Intel's vast marketing budget, its momentum, and its monopoly leverage forcing OEMs to stay away from technically superior alternatives like AMD64.
4) Thankfully this hasn't happened. The technically superior, open solution is winning. Thanks AMD.
> So we can find ourselves in a situation where
> one popular browser's (or rending engine) tics
> and weirdness dictates how to write webpages
> like IE does now?
As a core Gecko developer, I promise you that we are committed to fixing any tics and weirdnesses that deviate from published Web standards, and this will remain true even in the unlikely event we find ourselves with a monopoly. For Web developers, this means that if they rely on bugs of ours that deviate from Web standards, then we will eventually break their content.
Because we're open source, you don't even have to trust me. If you ever feel that Mozilla.org is abusing its position, you are welcome to gather followers, fork the code and carry the project on in whatever direction you wish.
No, Mozilla Suite has never had an autoupdate feature.
In fact, Firefox now supports automated updates. It will automatically update the entire browser if we push out such an update.