Domain: realworldtech.com
Stories and comments across the archive that link to realworldtech.com.
Comments · 215
-
IA-64 isn't the best choice
To start off, there's an error in your question. There's no such thing as ``the IA-64 CPU.'' IA-64 is the instruction set architecture behind Itanium and the around-the-corner McKinley, and while I could list all the features and shortcomings behind it, it'd be a boring and technical explanation.
You can jump to conclusions, if you wish, by looking at Ace's Hardware SPECmine database, which contains all current SPEC2000 results. In case you don't know, those are the industry standard benchmarks. Here is a sample query of the SPEC2000 integer benchmarks, and here is a sample query of floating point results. As you can see, there are always better choices than the Itanium, and mostly from vendors who have been in the field of 64-bit processors for long, and probably with better prices too. You can never underestimate how important vendor reputation and experience is.
I alluded to the shortcomings of the instruction set of Intel's 64-bit offerings. Indeed, the poor performance can mostly be attributed to it (although the Itanium's poor design has helped a lot -- let's see whether McKinley fares better.) The truth is, Intel took VLIW and redressed it as EPIC; but VLIW has never been a panacea. Serial designs with out of order execution have been around longer, and worked great. The Itanium is a strict in-order processor, and the SPECint results show. And compiler technology isn't there yet; but Intel has acquired Alpha from Compaq and employed their compiler and MPU design team, widely reputed as the best in the field. Whether clever design will be able to mend the instruction set flaws, only time will tell. Indeed, the best strategy now is to wait a few years; seeing as how hardly a thousand Itanium system were actually purchased and paid for, most people seem to be taking this route. With McKinley the situation will be better for Intel, but not until the 3rd or 4th generation will the mass purchases begin, if ever.
Another common misconception is about the performance of Sun hardware. Just look at the values linked above from the SPEC benchmarks. Sun is known for scalable, reliable hardware, stuff you can depend on. But they're not the best performers by any account. The best designs come, undoubtedly, from the Alpha team; the upcoming EV8 would be the most advanced processor for a long time to come, if it weren't for Intel (who cancelled the EV8 project, obviously.) Unfortunately, Alpha is no longer a good buy, though not by technical merits. Having a vendor who will vanish sometime in the future is never a good strategy. Luckily, IBM's POWER series and HP-PA still remain, although the Precision Architecture will be discontinued some time in the future as well. IBM's Power4 design is the current king of the hill in SPEC scores, beating the closest competitors by a fair amount.
Finally, a great source of information is the Real World Tech forum, and the ``Silicon Insider'' columns by Paul DeMone, on the same website. (Paul also regularly reads the forum, and posts quite frequently too.) -
Re:Initial DesignsThe other reason why the Pentium 4 isn't performing as well as it should is that it was built for fast streams of data into it's SSE2 execution units. The problem with the L2 cache isn't really that it is well undersized, it's that the x86 execution units suck altogether. The other problem is the 20-stage pipeline (which was required to ramp the speed up well beyond 1.1Ghz which they were stuck at with the 0.18-micron Pentium III) and they are still using the relatively poor x87 FPU units. The Athlon's FPU units blow away the x87 FPU units both the Pentium III and the Pentium 4 (the PIII and the P4 use almost the same FPU units).
Ace's Hardware and Real World Tech have great discussions and articles on the downfalls of the Pentium 4 processor. -
Re:oops
Get your details here. If you haven't read any of Paul DeMone's Silicon Insider columns, but you like computer architecture, you need to check it out. Well written, pretty much unbiased, and usually quite insightful.
-
Re:Silent fans get louder
How about a VIA C3 (Ezra) CPU and mainboard? These can run with a passive heatsink -- no fan. This can run office productivity software okay, but they are poor for FP-heavy number cruncher software. Anyway, here is a review: [realworldtech.com] -- Mike
-
Re:Cheap Linux box
How about a VIA C3 (Ezra) CPU and mainboard? These can run with a passive heatsink -- no fan. I don't know if Linux distributions work well with them, but they run office productivity software okay. They are poor for FP-heavy number cruncher software. Anyway, here is a review: http://www.realworldtech.com/page.cfm?section=rwt
l abs&AID=RWT100401013841 -- Mike -
This could have been Compaq/EV8...
But unfortunately the Q was undermined by its inability to think outside of the x86 PC and primarily by its moron-of-a-CEO Michael Capellas.
-
Re:IA64 is the "heir apparent"Sounds like you haven't heard about a little chip called the Power4
Plus Sun sure hasn't rolled over either, Sparc performance has always been subpar, but they make up for it with a good OS (Solaris) and tons of applications.
-
In other news...
In other news, AMD has changed its company name to Cyrix...
(BTW, the link above is actually a pretty interesting diatribe about why Cyrix's performance ratings stunk.)
-
There's history here.
It's astonishing how an article could spend that long talking about Intel suing VIA over a chipset which introduces a new DRAM type to an Intel CPU and not mention Intel's PC133 fiasco of two years ago.
For those who don't know, the only reason PC133 exists (as a PC standard DRAM type) is because of VIA. Flashback to early 1999: Intel had the market for chipsets (for Intel processors) almost completely to itself, riding on the enormously successful 440BX chipset, which used PC100. However, P3 speeds were ramping up while memory speeds had been stuck at PC100 for a couple years. The obvious thing to do was to update the BX to support a 133MHz FSB. After all, it was a dead-simple engineering trick (every BX mobo at the time could easily overclock to 133; many were stable up to 150), and the memory makers were already making SDRAM which could safely run at 133 but clocking at 100 because that was the highest official speed.
But instead--and unbeknownst to most of the techie world at that point--Intel had a contract with Rambus which offered them many goodies like the ability to make RDRAM controllers royalty-free (others paid up to 5%) and lots and lots of stock options. However, the contract was contingent on, among other things, Intel agreeing to do everything reasonably in their power to prevent "next-generation DRAM" types other than RDRAM from being paired with Intel processors for the consumer desktop. "Next-generation" was defined as > 1GB/s bandwidth.
PC133 has a bandwidth of 1.066 GB/s.
Moreover, Intel *thought* it was putting the finishing touches on the ill-fated RDRAM-only (at that point) i820 (Camino) chipset, with which they were going to introduce new and badly needed 133MHz FSB P3s. Instead, engineering delays involving the difficulties of getting RDRAM working (eventually they had to settle for only 2 RIMM slots instead of the original 3, a per-channel limitation which remains to this day), and the difficulties of getting a memory translator hub which allowed PC100 to be used on the i820 (a last minute addition when they realized people weren't exactly going to pay $500 for 128MB of RAM) working, pushed the release date back 6 months or so, until November.
Just to reiterate: Intel put off releasing 133MHz FSB P3s, and then when they did release them said that consumers could only use them with a buggy chipset, limited to 2 RAM slots, which offered one's choice of an extra-slow translated implementation of PC100 or of RDRAM which cost 10 times as much per bit as SDRAM. Meanwhile, tests with BX chipsets overclocked to 133 MHz FSB showed that this solution was significantly *faster* than the i820 + RDRAM chipset!
Into this world stepped VIA offering the Apollo133 chipset, the first P3 chipset explicitly designed to use PC133. Nevermind that it was probably *less* stable than an Intel BX overclocked to 133 MHZ FSB. Nevermind that it underperformed the BX@133 as well. And nevermind that then, as now, Intel sued VIA with all their might, among other things requesting injunctions forbidding all VIA products from leaving Taiwan. (The pretext then was that VIA was abusing Intel IP by using the P3 bus with a DRAM type Intel had not sanctioned.)
VIA quickly gained > 50% of the P3 chipset market.
Indeed, the only reason you see ALi, SiS, and soon-to-be nvidia and others getting into the 3rd-party chipset market is because VIA paved the way a couple years ago.
Intel tried every FUD tactic in the book, from suing in multiple jurisdictions to claiming that PC133 SDRAM was not stable (the DRAM itself! And this from the company which had spent the past year patching bugs with RDRAM!). Intel got their ass handed to them in court, and by in the summer of 2000 introduced the i815, essentially the BX@133 product they should have introduced in late 1998.
Intel doesn't like getting humiliated, though, and they've had a seemingly personal vandetta against VIA ever since. In retaliation, they denied VIA the chance to license the P4 bus, as ALi and SiS and (interesting) ATi have done. (This is the basis for the current *threatened* suits. However, it's interesting to note that the P4X266 is currently shipping and no suits have yet been filed, meaning this is probably just a bluff on Intel's part.)
Intel reps were even seen at the recent Comdex show threatening mobo makers who had VIA promotional balloons flying at their booths. All the balloons were taken by the Intel people.
However, Intel's case this time is as flimsy as last. Disregarding potential antitrust concerns, the fact remains that NatSemi, whom VIA recently purchased *did* have a license for the P4 bus, and thus so does VIA.
So does this mean VIA will have similar success as last time? Well, I think they'll easily prevail in court if it comes to that, although it appears that Intel may be playing this one all FUD and no bite: warning mobo manufacturers not to use the P4X266 rather than actually filing any lawsuits. While of course not stated in the article, the well-documented fact is that Intel is telling the mobo makers that if they use the VIA chipset they will have their allocation of Intel's SDRAM (and soon-to-be DDR) P4 chipset, the i845, curtailed or dropped altogether. The result will likely be that only the third-tier mobo makers, who probably wouldn't have gotten a Brookdale allotment anyways, will be using the P4X266.
But another reason VIA won't snap up the P4 chipset market is much more hopeful. SiS' DDR Athlon chipset, the 735, has earned rave reviews, significantly beating every other chipset around. Their upcoming 635 chipset for the P4 will offer all that and more, including support for 333MHz DDR (PC 2700) which is coming down the pipeline now.
And they *do* have a P4 license. -
Most widely used architecture?
The best selling 32-bit microprocessor family, at least as of 1999, was ARM. Paul DeMone has a solid article with plenty of historical and technical detail.
-
Re:Troubles
VIA Chipsets (KT133A) are also causing problems on *gasp* windows machines. Check out Real World Tech's latest industry update (found here). Its on the second page in the first paragraph. Basically if you stick an SB Live soundcard on some of the KT133A based mobos you get, you guessed, data corruption.
I guess this just adds to the cannon fodder
-
Re:Not transferring all of AlphaIt's called SMT -- Simultaneous Multi Threading.
Hop on over to the excellent Paul DeMone articles here and check out the 3 that start with "Alpha EV8"...
-
Re:Never saw the point of hardware review sites.
Everyone who reads hardware sites should check it out "Benchmarking: The Money Game" at RealWorldTech.com. They make similar points about how meaningless many hardware site reviews are.
-
Re:Speaking of IronyI think the main advantage of Mac hardware is in the market that Motorola targetted the G4: embeded systems. A laptop is pretty close to an embeded system, or at least has similar requirements: low power and small space. A G4 uses a lot less electricity to do the same amount of number crunching as most other processors that are at least as fast, x86 or otherwise. (i.e. not counting StrongARM/XScale or stuff like that.) Running without a fan is nice. My 486 laptop does that. When I feel like getting a new laptop, I'll probably try to get one based on the powerpc (or StrongARM, if anyone makes them!).
As for absolute performance, Mac fans should really stop deluding themselves. Everyone has their reasons for buying what they buy, so of course one is not automatically an idiot for buying a Mac that is more expensive and slower for general purpose computing than a new PC. Anyway, please peruse Silicon Insider, especially Apple's Power Failure
One thing you'll notice if you have a PPC is that compiling stuff takes longer than on x86. gcc just plain does more work when generating ppc code than x86 code. (i.e. generating ppc code takes longer than generating x86 code, using a cross compiler.). Presumably, this is because it has a lot more registers, among other differences, so there is a lot more stuff for gcc to consider while optimizing. Another thing is that, especially on my quad PPC604 mac clone from Daystar, compiling doesn't scale very well. I think the fact that the 512kB L2 cache is shared among the 4 CPUs, and has to be accessed over a relatively slow bus, really kills things. CPU-bound tasks that don't need to touch a lot of memory do a lot better. This has improved a great deal with recent Macs, especially the G4 which has great SMP support. (the G3 and G4 have per-processor L2 cache, and have the tag-check hardware on-die, even though the main SRAMs are external.)
Well, that was one of my more disorganized and wandering posts...
:)
#define X(x,y) x##y -
Re:Speaking of IronyI think the main advantage of Mac hardware is in the market that Motorola targetted the G4: embeded systems. A laptop is pretty close to an embeded system, or at least has similar requirements: low power and small space. A G4 uses a lot less electricity to do the same amount of number crunching as most other processors that are at least as fast, x86 or otherwise. (i.e. not counting StrongARM/XScale or stuff like that.) Running without a fan is nice. My 486 laptop does that. When I feel like getting a new laptop, I'll probably try to get one based on the powerpc (or StrongARM, if anyone makes them!).
As for absolute performance, Mac fans should really stop deluding themselves. Everyone has their reasons for buying what they buy, so of course one is not automatically an idiot for buying a Mac that is more expensive and slower for general purpose computing than a new PC. Anyway, please peruse Silicon Insider, especially Apple's Power Failure
One thing you'll notice if you have a PPC is that compiling stuff takes longer than on x86. gcc just plain does more work when generating ppc code than x86 code. (i.e. generating ppc code takes longer than generating x86 code, using a cross compiler.). Presumably, this is because it has a lot more registers, among other differences, so there is a lot more stuff for gcc to consider while optimizing. Another thing is that, especially on my quad PPC604 mac clone from Daystar, compiling doesn't scale very well. I think the fact that the 512kB L2 cache is shared among the 4 CPUs, and has to be accessed over a relatively slow bus, really kills things. CPU-bound tasks that don't need to touch a lot of memory do a lot better. This has improved a great deal with recent Macs, especially the G4 which has great SMP support. (the G3 and G4 have per-processor L2 cache, and have the tag-check hardware on-die, even though the main SRAMs are external.)
Well, that was one of my more disorganized and wandering posts...
:)
#define X(x,y) x##y -
Re:i860-based board?
Did anyone else do a doubletake on this? Or did I just miss a motherboard chipset in the 8xx series that happened to have the same number as Intel's old 32-bit mil-spec CPU line?
You may have missed it because it's not out yet--the i860 (codename "Colusa") is the chipset for the upcoming 1-2 proc P4 Xeon (codename "Foster"; actually, in an effort to be hugely confusing, the official name is now simply "Xeon") systems which should be released in a couple months or so. FYI, the 4+ proc chipset will be designated the i870.
And yes, it is a bit odd that Intel is recycling this rather inauspicious brand number. I suppose not many in the industry have a long memory.
(For more info on the first i860, Paul DeMone had an interesting article at RWT comparing its ambitious but flawed design to Itanium and its potential pitfalls.) -
Re: and what of the multicore PPC?You seem to be talking about the IBM POWER4 chip. Which looks to stomp the living shit out of anything and everything short of Alpha EV7. IA-64? Don't make me laugh.
Long live Big Blue!
(jfb) -
A nice article about SMT on the Alpha
A nice (and quite technical) article on the SMT for the Alpha can be found here
It is a 3-part article, click on "Alpha EV8 (Part x): Simultaneous Multi-Threat".
One of the things I like about SMT is that as it quite "cheap", it has a chance to spread quite effectively.
And when there will be a huge base of SMT ready CPUs, we will see AT LAST more software which takes advantage of parallelism (be it SMT or SMP).
-
That post doesn't deserve a 4You pay a 10% die cost for twice the throughput according to what Compaq is claiming for the Alpha EV8 (ref).
So, you're just wrong. Trying reading something about what you're commenting on first.
-
Re:BollocksI'd suggest you read this paper this paper or this paper
Pay particular note to the fact that you can take an existing superscalar chip and add SMT for only about a 10% chip real estate premium, while it should be able to double throughput. That's a lot better than trying to double throughput by adding another CPU to a machine or by adding two cores to a CPU.
Also note that it isn't recommended to run processes with different address spaces simultaneously on the processor because that would thrash the TLBs. Its only suggested that you let multithreaded apps (oracle, perhaps future versions of apache) load more than one thread into the processor at the same time.
-
Re:BollocksI'd suggest you read this paper this paper or this paper
Pay particular note to the fact that you can take an existing superscalar chip and add SMT for only about a 10% chip real estate premium, while it should be able to double throughput. That's a lot better than trying to double throughput by adding another CPU to a machine or by adding two cores to a CPU.
Also note that it isn't recommended to run processes with different address spaces simultaneously on the processor because that would thrash the TLBs. Its only suggested that you let multithreaded apps (oracle, perhaps future versions of apache) load more than one thread into the processor at the same time.
-
Mmm, content-free article...
Well, where do we start...
Multi-core CPUs like IBM's are SMP-on-a-chip, which is not the same as SMT by any stretch.
SMT, because more of the functional units on the chip are staying active at one time, increases heat and power consumption just about as much as SMP-on-a-chip, though it may be marginally better because the core-level overhead won't be present.
"SMP-aware" applications? Yeah, you need something like that with the Mac and its cooperative multitasking and wacky thread model. However, with any normal preemptive multitasking, thread supporting OS, introducing threading into a program makes it "SMP-aware" by default (though you may find new/different bugs on an SMT or SMP system).
The only thing I can think of that would be an "SMT optimization" at the application programming level would be threading any floating point calculations separately from integer calculations, thus allowing the FP units to be running independently from the rest of the application.
Methinks Vince needs to bone up at little...
Some more links:
Universiity of Washington SMT info, this is also linked to from the UMass link previously posted
Look at some more Alpha specifics from the source
I believe these Real World guys were quoted in the last Slashdot SMT reference (and look, Hemos posted that one too... you'd think they'd read the links...>
Disclaimer: I worked for Compaq (though not the DEC side) on porting an OS to Alpha a couple years ago and having to be aware of SMT in EV8 coming down the pike. -
Re:Just one question... why?A lot of people are assuming that multiple processors can be put on the same die for "equal or less cost". This simply isn't true.
Sharing the cache is hard
Cache is the vast majority of chip area in a modern processor; as others have pionted out, it's obvious that multiple processors should share a cache. However, this is difficult. The problem is that every load/store unit from every processor must share the same cache bandwidth.
Thus, for a 2-way chip with only a shared cache, memory latency---to the cache, the best possible case---is cut in half.
We can work around this by using various tricks up to and including multiported caches---but most of these tricks increase latency (lowering maximum clock speed) or require much more circuitry in the caches (we were sharing the cache because it was so big, remember?).
It makes much more sense to share the circuitry that feeds into the cache.
Those are the superscalar execution units! Thus, SMT.
Utilization
Instead of keeping half the execution units busy, we attempt to keep them all busy. Extrapolating very roughly from Figure 2 we can expect to issue about half as many instructions as we have issue slots (actually less if we have a lot of execution units). The basic idea is we can cut the number of empty issue slots in half each time we add a new thread. Further, instructions from separate threads do not need to be checked for resource overlaps---this circuitry is the main source of complexity in a modern processor.
What's happening now has been predicted for a long time. The extra resources (a bigger register set, TLB, extra fetch units) required for multithreading are now cheaper than the extra resources you'd need (mostly pipeline overlap logic) to get a similar increase in single-threaded performance.
SMT easier than SMP?
Moving thread parallelism into the processor is actually easier for the compiler and programmer; the weak memory models implied by cache coherence models aren't an issue when threads share exactly the same memory subsystem.
To get an idea for how hard it is to really understand weak memory models, consider Java (which actually tries to explain the problem to programmers---in every other language you're on your own). Numerous examples of code in the JDK and elsewhere contain an idiom---double-checked locking---which is wrong on weakly-ordered architectures. What's this mean? Your "portable" Java code will break mysteriously when you move it to a fast SMP. Alternatively, you will need to run your code in a special "cripple mode" which is extremely slow.
From a programmer's perspective, SMT (as opposed to SMP) architectures will be a godsend.
-
Ripped off Ars Technica lately?
Similar story with the same links is posted on the Ars Technica frontpage today, except Hannibal of Ars is the one that did the research and came up with these articles at Realworldtech.com. At least "Dean Kent" managed to read a few of the links a bit before submitting this info as his own. Perhaps next time you can submit a link to the rightful author's page rather than bypassing it and claiming his hard work as your own.
-
Benchmarks miss the point!
The FPU in the P4 is there for x86 compatability. Intel is betting that software developers will use some of the P4's 144 new instructions to accomplish floating point operations. The new instructions, if used properly, could realize significant speed increases.
Further, as was mentioned earlier, Intel has always released new generations of CPUs that didn't exactly take the benchmark world by storm. Wait 'till software emerges that takes advantage of what the P4 has to offer. Then you can try to complain.
Didn't anyone read what Paul DeMone had to say? Or Ace's review of the P4? -
Here's one
I havent seen a single glowing review ANYWHERE
Have a look at this one. These guys are really onto it. Note that they are saying that the P4 has great potential, not that it's a fantastic processor for anyone now. Truth is I wouldn't be suprised to see most review sites using many SSE2 optimized benchmarks in a years time and then the P4 will be looking damn good. Still won't be any good for me with my legacy software though. -
Re:Business as usual.My first thought was that it's ironic that Intel are trying patents on instruction sets now, given that no one has expressed any interest in cloning IA64 at all. AMD have their own 64 bit architecture after all, and it looks more acceptable for desktop use than IA64. This is an excellent overview of AMD's plans.
Of course with all the convenient secret patent 'licenses', intel will naturally be using this exclusively to sue AMD or anyone else trying to compete with them.
Until Rambus appeared these sorts of things were almost always resolved via patent license trading. AMD have a fair portfolio of their own, I doubt they'd be afraid of anything Intel are likely to do.Nope, my guess (without having checked the details of the patent) is that this is an attempt by Intel to get some leverage over Transmeta (or anyone else) incase they want to simulate the instructions in software. I'm sure Intel would love to get their hands on some of Transmeta's patents.
-
Re:You have way too much time on your hands, frien
First off, apologies for slipping in ad hominem attacks in my post. However, this was just in response to your similarly inappropriate attacks on Johan and Ace's. The difference, of course, is that my comments were in support of the correct analysis, not disparaging it.
This is not a site on which every little review and rant is meant to be posted. Hannibal's article belonged here because, well, Hannibal is an expert on the technology behind microprocessors.
1) It was Hemos' decision to post this; anyone can submit anything they deem worthwhile.
2) This was neither review or rant, but rather a lengthy and insightful look at some subtle but very important issues that will influence P4 vs. Mustang performance. Just because you've never seen anything on the web supportive of the P4 doesn't make a balanced piece a rant; it just means that you've been reading a lot of ignorant writing.
3) Humorously enough, the "self-promot ing" Hannibal link I offered was exactly "every little review", this time of some gimmicky portable (but monitor-less) PC. I found it entertaining, and was happy to see it on /., but it was the very definition of a fluff piece--like much of /., now that you mention it.
4) Hannibal is NOT an MPU expert. He himself will acknowlege this, and has in his articles (don't have time to find where). Email him yourself and ask him who is more of an expert, himself or Johan De Gelas, and I am relatively certain he'll say Johan. If not, he will readily admit that Johan is at least his equal and that Ace's is a much more technical site than Ars. And he will most certainly admit that Paul DeMone is 10 times the expert he is. Again, I really really like Hannibal's work, it honestly inspired me, and I submit every new Hannibal-on-architecture article to /. But he is just a student, not an expert.
And, just FYI, I have read every single article on microproccessor design that has passed by the /. pages for two years, plus linkage from several other sites, and a few print articles (though I no longer like to touch paper. How primitive...). I could easily look up links, hell, just by using the search features on /. and Anand.
ROFLMAO!
You read the scant handful of poorly chosen architecture articles linked from slashdot and you consider yourself an expert??? HAHAHAHAHAHA. Oh--and sometimes you check your facts with little old 16-year old Anand.
Look dude, it isn't my place to criticize you for not knowing as much about MPU design as I do. It is my place to criticize you for not realizing that there is much more to be known, for not realizing that many people do know more about it than you. I am certainly no expert--I'm just a college student--but it is blindingly clear that I know more about it than you, just as it is clear to most /.ers that they know more about computers than, say, the guy who says he needs to go out and buy more "RAMs" because the new game he just bought says it requires 250MB of free space to install. (Don't worry--you're not that bad, it was just an analogy. ;)
Second it's quite clear that you essentially skipped all the parts of the article you didn't understand and concluded that if you--with your expert education on MPU design from /. and Anandtech--didn't know what was going on, it must be "IYNSHO, fluff masquerading as technical writing". Unfortunately, your opinion, humble or no, does not apply here: it is indeed a fact that this piece contained several new insights, and synthesized information which was not easily available in other forms. This may not meet your standards of being "more than a book report," but it is certainly meets those of technical writing. Obviously Johan could not hope to benchmark the new P4 or Mustang cores, as they are not released yet; still he managed to include some insightful benchmarks which demonstrate the points he was abstractly discussing with ample clarity. (Of course, if you're used to looking at MPUs as mysterious black boxes, then you might wonder what rehashed K6 benchmarks are doing in a Mustang/P4 article.) If you truly believe that this article included "nothing unique", why don't you post just one article detailing the issues I raised in my previous post? Since you've obviously read such an article yourself, MPU expert that you are, it shouldn't be too difficult to dig up a link, even without resorting to "the search features on Anand." (LOL!)
No, Johan didn't take what might be called the "Hannibal route"--i.e. launch into an exploration of the overall design philosophies behind the two cores--because he is writing for a specific audience, a knowledgeable technical audience who can be expected to have read several pieces explaining the important design features of the P4 (not much concrete is known about the Mustang other than that it will be a K7 with tweaked layout to improve critical path and power consumption, and that it may recieve several other enhancements as speculated in the article), specifically those here, here, here, here, and here. Not only have most regular readers of Ace's read all these articles, but they have followed some very interesting debates on them between industry experts on the Ace's tech forum for months now. It might be fair to criticize Johan for submitting an article which clearly assumed such a technical background to /. (although in fairness he includes a link to his earlier, more general P4 article in the very first sentence); of course, it's /. who decides what to post on their own site, not Johan.
Re: preproduction benchmarks, the Tom's piece on the PII and the Firingsquad piece on the K7 were generally the only benchmarks available of the respective chips before their launch. If you followed MPU design news as closely as I do you would know this. There is a thing called an NDA, after all; as these two pieces demonstrate, both Intel and AMD like to make sure that those who choose to break theirs post erroneous information.
You're of course right that a PPro was indeed superior to a PII at a given clock speed; if you look through the article itself instead of just relying on the concluding quotes I posted, you would find benchmarks which clearly understate the known performance of the PII by as much as 30 or 40%, though. There is no doubt Tom's preproduction benchmarks, like Firingsquad's, were horribly off. And as long as you're disputing my "always" contention--I've ponied up the links (and no, it didn't take me very long at all, because I, having followed MPU developments for a couple years, knew for example that it was FS with the bad preproduction K7 benchmarks, and Tom with the PII controversy); why don't you post a single pre-NDA "review" or even just a series of leaked benchmarks on a new x86 core which proved entirely accurate?
Re: definition of a 7th-gen core: You really should pay more attention in class, boy, because I'm schooling you right now. I explained what I meant right after that comment in the original post: The Athlon is a new core designed to scale well to very high clockspeeds. Just like the Willammette. That's why the Willamette performs slower clock-for-clock than a theoreticl P!!! at the same clockspeed
First off, there is no evidence that the P4 has lower IPC than the P3, except for preproduction benchmarks and some ambiguous comments from Intel VPs. If you read my previous post at all, you would realize these would tend to indicate that the P4 actually has higher IPC, not lower. On the other hand, the main evidence that it has higher IPC is that an analysis of all the new, innovative, braniac features of the core strongly indicate that it must.
And second off, you couldn't be more wrong. By calling the K7 a "7th-gen core" you are obviously comparing it to the 6 previous generations of Intel cores. Each of them was able to improve both clock speed on identical process and IPC significantly over the previous generation. The Athlon beats the P6 in clock speed on identical process...but only narrowly: the Athlon sweet spot right now is around 1 GHz on Dresden's .18 Cu process; the P6 around 750 MHz on a .18 Al process. Intel's process is probably slightly better except for the large Cu vs. Al gap, so we can be charitable and say that, on identical processes, the K7 clocks 25% faster in an untweaked core than the P6 does in a much-tweaked core. Indications are that the Mustang/Palomino/Morgan K7 tweak will reach 1.5 GHz on .18 Cu, so perhaps 35-40% better on equivalent processes. As for IPC, the P6 and K7 are essentially equal. Indeed, this is being generous to the K7, as the P6 knocks it all over the place in the fairest cross-platform bench there is, SPEC. Yes, this is because Intel's in-house compiler group is better than AMD's...but the compiler is arguably just as much a part of a core as the silicon itself.
Meanwhile, P4 roadmaps indicate that it will scale 100% better than the P6 on identical processes, and the analysis of Paul DeMone, a far greater MPU expert than you or I could hope to be, is that it will have 15-20% better IPC for integer work, and considerably greater gains for FP. (It's too soon to tell without knowing more about how well compilers will optimize for SSE2.) That would be a 7th-gen core worthy of the leap from 5th to 6th which the P6 provided.
Again, don't get me wrong: the Athlon is clearly clearly superior to the Coppermine P3. But only by about the same degree as the Coppermine P3 was superior to the Katmai P3. That is, *not* by a full "generation"--whatever the hell that is.
Re: important innovations in the K7: It's a huge innovation in the x86 world, something Intel hasn't done since the PPro days. First of all, the EV6 bus is new to x86 and a huge innovation, it is superior in every way to the old Intel GTL+.
BWAA HAA HAA HAA HAA HAA!! Man, I'm rolling on the floor and crying that's so pitiful.
Oh, but I'm being rude. Ahem. Pardon me. You, uh...you do know why it's called the EV6 bus, don't you? ...Even though that name happens to be shared by the current generation of Compaq Alpha MPUs... Or wait; actually the official name of the current Alphas chips is 21264; it's just that they like to code name their core variants things like "EV6" and "EV67" and the upcoming "EV68". Based on what, again? What's that? Based on the code name of the current Alpha platform???
HAA HAA HAA HAA HAA! I asked you to pick one innovative feature of the K7, and you picked the one feature that AMD DIDN'T INVENT!!!!!
Ok, I'm over it now. Phew.
Right. AMD didn't invent the EV6 bus. They didn't help develop it. They in fact had nothing to do with it. They licensed it, wholesale, lock, stock, barrel, from Compaq where it has been in use for quite some time now. On the one hand, it was a good business decision because Intel had just clamped down and decided not to relicense the P6 bus (not really called GTL+ BTW, but don't worry it's a very common mistake) to AMD, and rather than take the time to reinvent the wheel (and thus delay the launch of the K7), AMD decided to go shopping at Compaq. Fine. Smart decision.
Don't give me any of this revisionist history that they did it because it's 200 MHz, though. The K7's extra FSB bandwidth (courtesy of the EV6 bus (and the engineers at Compaq, not AMD)) has up till now been entirely wasted as it is paired with SDR SDRAM (1.6 GB/s FSB, only 1.06 GB/s from DRAM)--generally paired asynchronously with PC133. If it were any help at all, don't you think the Athlon would be winning and not losing in FSB-intensive benchmarks like Q3? Meanwhile, it's a huge waste of pins and power--as well it should be, since it was originally designed for $10,000-50,000 workstations and servers, which, frankly, can afford the extra mobo costs, power supplies and electric bills.
Now, of course this extra FSB bandwidth will finally be put to good use with the advent of DDR mobos for the K7, *finally* starting early next month (fingers crossed!); latest news is the 1st. DDR mobos for the P3 will show less improvement because the P3 is stuck at 133 MHz FSB. Fine.
But this isn't why AMD chose the EV6. Indeed, when they made that decision, the DDR standard had either barely-just-been or had-not-yet-been determined by JEDEC. Intel was set to steamroll RDRAM into every PC, and there was little to no indictation that DDR would ever be a volume part in the PC industry. (It'd be used in servers and such.) AMD chose the EV6 because they *had* to, not because they wanted to. It's a great bus when doing what it's designed for--connecting specially made (quite expensive) double-wide SDRAM to Alphas, at FSB freqencies up to 466 MHz. But it offers little to no performance benefit in the here and now for the K7. And as for DDR and high-speed buses, Intel will be releasing their Tualatin revision P3's in Q2 with a 200 MHz FSB, in time for Almador, their (maybe--legal issues with Rambus...) DDR P3 chipset. So yes, the K7 will be first with decent DDR support in the x86 space. The P4's dual-RDRAM chipset and 3.2 GB/s FSB will be faster, though more expensive, as far as memory performance goes, though.
But calling a bus that AMD had exactly zero nada zilch nothing to do with evidence of their design innovation gets an extra HAHAHAHAHAHA from me.
More on AMD's innovative EV6 bus: it's a huge boon for multiprocessing, with the 760MP coming to retail very soon according to Anand.
Unfortunately, according to AMD's Q3 earnings report Investor Conference Call 2 weeks ago (I was listening; somehow I doubt you were...), the 760 MP has been delayed to at least Q1, possibly Q2. They played it off as strategic reasons (business demand down; no major deals with the big 4 server OEMs (Dell, IBM, HP, Compaq) for AMD in the enterprise lines), but considering they only had one 2-way system--behind closed doors and not running anything--at MPF it looks as if their engineering is behind too. On the one hand, too bad, because point-to-point beats shared bus any day. On the other, there's a reason why Intel went with shared bus, and it's not because they'd never heard of PTP. It's, well, easier to implement. When doing the right thing takes over a year longer, it sometimes becomes doing the wrong thing. (Not that I believe that's true here, but it's worth taking into consideration.)
Now, the countryside is littered with Athlon clusters crunching numbers for the scientific community in places where they'd never have considered using a P!!!.
First off, scientific computing is such a niche market as to have absolutely negligable impact on the bottom line of either company. The idea that AMD designed the K7's huge-ass FPU--thus taking up vital die-space--for the lucrative physicist market is laughable. It's an unbalanced design, plain and simple. Second, last time I checked, most scientific computing was being done either on Alphas or on Beowulf's of Celerons. Now, I don't doubt that K7's are moving heavily into the mix; if I was doing scientific computing, I would go with a cluster of Durons in a heartbeat.
But do you really, honestly, think that when AMD decided to go with the 3-wide FPU there were dreams of meteorology and electron potential modeling spinning in their heads? Me either.
And your "analysis" of the supposed advantages of the Coppermine's cache over the Thunderbird's are positively laughable. You see, in the REAL WORLD people don't run benchmarks on their boxes all day. They run apps and processes, usually several at a time. That's why the Athlon's cache is superior--you can keep more in it instead of swapping to system RAM, which is a MUCH BIGGER HIT than having a small amount of cache latency. In the REAL WORLD, the Athlon's cache architecture makes sense, not in your fantasy where we all run CPUmark all day.
Uhhuh. That's why the Katmai P3--with its half-speed 512 Kb L2--was so much faster than the Coppermine? That's why the Athlon "Classic"--with down to 1/3-speed 512Kb L2--is so much better than TBird??
You think it's faster to perform a context switch with a 64-bit bus to L2 than a 256-bit one? Golly, imagine how slow the P4 with its 48 GB/s bus to the 5-cycle latency L2 will be!!
Furthermore, in case you'd forgotten, all these chips operate at over one billion cycles per second. Multitasking occurs at much higher granularity than this, and even if your analysis were right (it's not), the effects of multitasking are invisible to a chip to a second or third order of approximation. The effects of a 7 (or 5!) cycle L2 vs. an 11 cycle one most certainly are not.
And then you go on to say that the K8 will just be a derivative of the K7. You really aren't paying attention at all. The K8 Hammer architecture is completely new, not only extending x86 to true 64-bit while retaining backwards-compatibility with 32-bit and 16-bit code, but adding huge and significant architectural innovations. Go read about it, dimwit, before you guess at what it is. Lots of documentation has been released--even just a quick scan of some Slashdot search results will make you a lot more knowledgeable about it than you are now. Geez...
No, this is false. It is by now quite well known that what will define the Hammer family will be just a simple extension of the x86 ISA to 64 bits--an extension which will have essentially no use for the average PC user, but rather only for those who need 64-bit integer precision (CAD, etc.) or >32-bit memory address space (database, etc.). In addition, "the K8"--that is, the Sledgehammer, aimed at the enterprise market--will feature 2-way CMP and AMD's new Southbridge standard, LDT. Ho-hum. Nice features (LDT has no place in the PC either, though), but nothing extraordinary, especially considering it's not due until early 2002. In addition, there has been mention (Sanders mentioned it in an interview) of another K8 variant called Clawhammer; speculation is that this is a PC version of the K8, although it's not known what, other than x86-64, will differentiate it from, say, Palomino.
If you actually believe the K8 represents an entirely new design, then it may be that your news was correct but just a little (about 12 months) late. The K8 was indeed scheduled to be a ground-up clean-sheet kick-butt design, but was radically scaled back by Sanders less than a year ago. Head Designer and impressive guy Atiq Raza quit around a year ago, following this decision, and the hopes of a truly innovative K8 went with him. Of course, evolution is often better than revolution in the MPU industry--eg. RDRAM. If it can manage to position itself against Itanium, the K8 might look very strong. (Of course, McKinley will be on the way by then, and it's considerably less of a joke.) But claiming that the K8 is a revolutionary new design is plain false.
As for the recent /. articles on the K8, they have all been, IIRC, about the recently released x86-64 simulator to help Linux, etc. port to the new ISA. This has nothing whatsoever to do with the design of the K8 itself--an x86 simulator could help "port" Linux to either a P5 or a P6, although they could not be more different architecturally--just the ISA.
Re: P4 moving into heavy volume in Q3 2001: God, you are a shameless, and dim-witted, Intel apologist, just as I suspected. Q3 2001?
How precisely does this make me an Intel apologist? Is it too early?? It is a known fact that Intel's roadmap moves the P4 solidly into the mainstream category in Q3 '01 with the introduction of the Northwood P4 on a .13 Cu process. Northwood will allow Intel to get good yields at >2.0 GHz, and, more importantly, takes up a much more reasonable die space for mass production. Just as important, its release will coincide with the release of the (hopefully DDR) SDRAM Brookdale chipset, which ought to move the P4 out of the quite-high end where it will be stuck with the dual-RDRAM Tehama chipset. (3rd party DDR chipsets may be out for the P4 before then, but probably not in much volume before Q2 at the earliest.)
Is it too late?? The latest Intel roadmap shows the P4 moving to the upper end of the mainstream category in Q2, but I believe that to be a lie by Intel marketing, eager to cover up the fact that they essentially have no upper-mainstream product from now until Q3 2001, a hole in their product line a mile wide. (Am I still an Intel apologist?) Indeed, this is the reason I just bought AMD stock very recently, and have been encouraging my INTC-owning relatives to sell ever since, well as it turned out, just before the peak late this summer. And yes, like you too I am generally appalled by Intel's heavy-handed anti-consumer tactics--suing VIA and refusing to release a PC133 chipset in a lame attempt to force RDRAM down the industry's throat; paper launching the 1 GHz P3 6 months before even limited volume was available, the 700-850 MHz P3's before it around 3 months early, and the 1.13...oh the 1.13...all in a lame attempt to pretend the P6 could keep up with the Athlon; bribing Michael Dell with special pricing and all several dozen GHz P3s available this summer to spread libelous statements to the media in a lame attempt to disparage AMD's products; spreading IA-64 FUD in a lame though successful attempt to scare designers of competing RISC chips to delay (Compaq, HP) or eliminate (MIPS) their next-gen chips; keeping the Celeron FSB clocked at 66 MHz and "single-processor only" in a lame attempt to...be lame.
Don't worry, I dislike Intel plenty a lot. I cheer for AMD, and make no bones about it.
What bothers me, though, is that, having been on the Athlon bandwagon since summer 1999, when I first read analyses of how the K6's poor scaling was due to architecture not process quality, and how the better balanced K7 had the chance to scale even higher than the P3, I've seen how this position has gone from being contrarian, well-informed and far-sighted to the position of a growing mainstream of ill-informed buzzword-spouting reality-ignoring AMD fanboys. No, not you; the people I'm talking about are much much worse (and hence not nearly as able to fool /. with uninformed arguments). What's even worse, though, is that several influential tech sites employ writers not much more knowledgeable than you, and they spout the same pro-AMD propaganda day after day after day. It's not that I dislike seeing anything pro-AMD or anti-Intel; indeed, exactly the contrary. It's just that I like it to be true.
Plus, AMD's execution with the K7, while quite good, has been well short of the claims that I and many others were making for it over the past year. The benchmarks have been disappointing. There's only so much excitement you can get out of awesome benches in 3DSMax and ViewPerf before you notice those Q3 and Content Creation scores just aren't going to change. (Yes, I know CC is Intel-biased. Whatever.) Thunderbird in particular was a huge disappointment, offering gains on the order of 3-5% over Athlon Classic while the Coppermine P3 beat Katmai by 10-15% (it's that 64-bit vs. 256 bit L2 bus). MP has been MIA for months now. The K7 laptops are late as well; high power-consumption is the price you pay for unneeded FPUs.
Having read the Willamette articles I've now referred you to twice (the DeMone ones on RWT), having seen Paul defend his unorthodox position on the Ace's tech boards for months now, basically skewering even very well-informed arguments on the AMD side, I've gradually become convinced that the "web hardware community" is greatly underestimating the P4's performance. So have many people much more knowledgeable than me--including the formerly (and still, though less so, IMO) AMD-biased Johan.
I usually go around looking to argue with P4-bashers who seem intelligent and well-versed in the technology, because they give the most interesting arguments and are the most willing to learn. Unfortunately, I too often have to correct well-meaning but misleading posters like yourself, who ignorantly pass on the same-old wishful thinking and oversimple analysis as fact.
I like AMD. I really do. I want them to "win", inasmuch as I want anyone to. I really do want them to stay very very competitive, like they are now. (And to make me lots of money!) But I just don't think it's helping them, or helping the truth, to pretend that the K7, a largely derivative design, will be able to keep up with the radically innovative P4 for very long. And I don't think it's furthering the principles of beauty and elegence in design--which is what really interests me in this stuff anyways--to call an insightful and fair (I thought it strongly gave AMD the benefit of the doubt, BTW) analysis of the strengths and weaknesses of the P4 and Mustang designs "an ignorant fluffy rant", or whatever you said.
I won't expect the apology from you, but you have my email address if you should want to send it. Meanwhile, if you're really interested in MPU design, please read Paul's articles at RWT; they're fabulous and take everything to a whole new level. And if they must be anti-Intel, you can't do better than his Merced/Itanium articles, here, here, here and here.
Also you should check out the tech forum at Ace's, and the very AMD-biased but usually literate and often a great site for news and links...JC's. Plus the usual suspects: Tom's, Ars, The Register for juicy-and-occasionally-even-true rumors. You could learn a lot, and trust me, it's fascinating stuff. -
Re:You have way too much time on your hands, frien
First off, apologies for slipping in ad hominem attacks in my post. However, this was just in response to your similarly inappropriate attacks on Johan and Ace's. The difference, of course, is that my comments were in support of the correct analysis, not disparaging it.
This is not a site on which every little review and rant is meant to be posted. Hannibal's article belonged here because, well, Hannibal is an expert on the technology behind microprocessors.
1) It was Hemos' decision to post this; anyone can submit anything they deem worthwhile.
2) This was neither review or rant, but rather a lengthy and insightful look at some subtle but very important issues that will influence P4 vs. Mustang performance. Just because you've never seen anything on the web supportive of the P4 doesn't make a balanced piece a rant; it just means that you've been reading a lot of ignorant writing.
3) Humorously enough, the "self-promot ing" Hannibal link I offered was exactly "every little review", this time of some gimmicky portable (but monitor-less) PC. I found it entertaining, and was happy to see it on /., but it was the very definition of a fluff piece--like much of /., now that you mention it.
4) Hannibal is NOT an MPU expert. He himself will acknowlege this, and has in his articles (don't have time to find where). Email him yourself and ask him who is more of an expert, himself or Johan De Gelas, and I am relatively certain he'll say Johan. If not, he will readily admit that Johan is at least his equal and that Ace's is a much more technical site than Ars. And he will most certainly admit that Paul DeMone is 10 times the expert he is. Again, I really really like Hannibal's work, it honestly inspired me, and I submit every new Hannibal-on-architecture article to /. But he is just a student, not an expert.
And, just FYI, I have read every single article on microproccessor design that has passed by the /. pages for two years, plus linkage from several other sites, and a few print articles (though I no longer like to touch paper. How primitive...). I could easily look up links, hell, just by using the search features on /. and Anand.
ROFLMAO!
You read the scant handful of poorly chosen architecture articles linked from slashdot and you consider yourself an expert??? HAHAHAHAHAHA. Oh--and sometimes you check your facts with little old 16-year old Anand.
Look dude, it isn't my place to criticize you for not knowing as much about MPU design as I do. It is my place to criticize you for not realizing that there is much more to be known, for not realizing that many people do know more about it than you. I am certainly no expert--I'm just a college student--but it is blindingly clear that I know more about it than you, just as it is clear to most /.ers that they know more about computers than, say, the guy who says he needs to go out and buy more "RAMs" because the new game he just bought says it requires 250MB of free space to install. (Don't worry--you're not that bad, it was just an analogy. ;)
Second it's quite clear that you essentially skipped all the parts of the article you didn't understand and concluded that if you--with your expert education on MPU design from /. and Anandtech--didn't know what was going on, it must be "IYNSHO, fluff masquerading as technical writing". Unfortunately, your opinion, humble or no, does not apply here: it is indeed a fact that this piece contained several new insights, and synthesized information which was not easily available in other forms. This may not meet your standards of being "more than a book report," but it is certainly meets those of technical writing. Obviously Johan could not hope to benchmark the new P4 or Mustang cores, as they are not released yet; still he managed to include some insightful benchmarks which demonstrate the points he was abstractly discussing with ample clarity. (Of course, if you're used to looking at MPUs as mysterious black boxes, then you might wonder what rehashed K6 benchmarks are doing in a Mustang/P4 article.) If you truly believe that this article included "nothing unique", why don't you post just one article detailing the issues I raised in my previous post? Since you've obviously read such an article yourself, MPU expert that you are, it shouldn't be too difficult to dig up a link, even without resorting to "the search features on Anand." (LOL!)
No, Johan didn't take what might be called the "Hannibal route"--i.e. launch into an exploration of the overall design philosophies behind the two cores--because he is writing for a specific audience, a knowledgeable technical audience who can be expected to have read several pieces explaining the important design features of the P4 (not much concrete is known about the Mustang other than that it will be a K7 with tweaked layout to improve critical path and power consumption, and that it may recieve several other enhancements as speculated in the article), specifically those here, here, here, here, and here. Not only have most regular readers of Ace's read all these articles, but they have followed some very interesting debates on them between industry experts on the Ace's tech forum for months now. It might be fair to criticize Johan for submitting an article which clearly assumed such a technical background to /. (although in fairness he includes a link to his earlier, more general P4 article in the very first sentence); of course, it's /. who decides what to post on their own site, not Johan.
Re: preproduction benchmarks, the Tom's piece on the PII and the Firingsquad piece on the K7 were generally the only benchmarks available of the respective chips before their launch. If you followed MPU design news as closely as I do you would know this. There is a thing called an NDA, after all; as these two pieces demonstrate, both Intel and AMD like to make sure that those who choose to break theirs post erroneous information.
You're of course right that a PPro was indeed superior to a PII at a given clock speed; if you look through the article itself instead of just relying on the concluding quotes I posted, you would find benchmarks which clearly understate the known performance of the PII by as much as 30 or 40%, though. There is no doubt Tom's preproduction benchmarks, like Firingsquad's, were horribly off. And as long as you're disputing my "always" contention--I've ponied up the links (and no, it didn't take me very long at all, because I, having followed MPU developments for a couple years, knew for example that it was FS with the bad preproduction K7 benchmarks, and Tom with the PII controversy); why don't you post a single pre-NDA "review" or even just a series of leaked benchmarks on a new x86 core which proved entirely accurate?
Re: definition of a 7th-gen core: You really should pay more attention in class, boy, because I'm schooling you right now. I explained what I meant right after that comment in the original post: The Athlon is a new core designed to scale well to very high clockspeeds. Just like the Willammette. That's why the Willamette performs slower clock-for-clock than a theoreticl P!!! at the same clockspeed
First off, there is no evidence that the P4 has lower IPC than the P3, except for preproduction benchmarks and some ambiguous comments from Intel VPs. If you read my previous post at all, you would realize these would tend to indicate that the P4 actually has higher IPC, not lower. On the other hand, the main evidence that it has higher IPC is that an analysis of all the new, innovative, braniac features of the core strongly indicate that it must.
And second off, you couldn't be more wrong. By calling the K7 a "7th-gen core" you are obviously comparing it to the 6 previous generations of Intel cores. Each of them was able to improve both clock speed on identical process and IPC significantly over the previous generation. The Athlon beats the P6 in clock speed on identical process...but only narrowly: the Athlon sweet spot right now is around 1 GHz on Dresden's .18 Cu process; the P6 around 750 MHz on a .18 Al process. Intel's process is probably slightly better except for the large Cu vs. Al gap, so we can be charitable and say that, on identical processes, the K7 clocks 25% faster in an untweaked core than the P6 does in a much-tweaked core. Indications are that the Mustang/Palomino/Morgan K7 tweak will reach 1.5 GHz on .18 Cu, so perhaps 35-40% better on equivalent processes. As for IPC, the P6 and K7 are essentially equal. Indeed, this is being generous to the K7, as the P6 knocks it all over the place in the fairest cross-platform bench there is, SPEC. Yes, this is because Intel's in-house compiler group is better than AMD's...but the compiler is arguably just as much a part of a core as the silicon itself.
Meanwhile, P4 roadmaps indicate that it will scale 100% better than the P6 on identical processes, and the analysis of Paul DeMone, a far greater MPU expert than you or I could hope to be, is that it will have 15-20% better IPC for integer work, and considerably greater gains for FP. (It's too soon to tell without knowing more about how well compilers will optimize for SSE2.) That would be a 7th-gen core worthy of the leap from 5th to 6th which the P6 provided.
Again, don't get me wrong: the Athlon is clearly clearly superior to the Coppermine P3. But only by about the same degree as the Coppermine P3 was superior to the Katmai P3. That is, *not* by a full "generation"--whatever the hell that is.
Re: important innovations in the K7: It's a huge innovation in the x86 world, something Intel hasn't done since the PPro days. First of all, the EV6 bus is new to x86 and a huge innovation, it is superior in every way to the old Intel GTL+.
BWAA HAA HAA HAA HAA HAA!! Man, I'm rolling on the floor and crying that's so pitiful.
Oh, but I'm being rude. Ahem. Pardon me. You, uh...you do know why it's called the EV6 bus, don't you? ...Even though that name happens to be shared by the current generation of Compaq Alpha MPUs... Or wait; actually the official name of the current Alphas chips is 21264; it's just that they like to code name their core variants things like "EV6" and "EV67" and the upcoming "EV68". Based on what, again? What's that? Based on the code name of the current Alpha platform???
HAA HAA HAA HAA HAA! I asked you to pick one innovative feature of the K7, and you picked the one feature that AMD DIDN'T INVENT!!!!!
Ok, I'm over it now. Phew.
Right. AMD didn't invent the EV6 bus. They didn't help develop it. They in fact had nothing to do with it. They licensed it, wholesale, lock, stock, barrel, from Compaq where it has been in use for quite some time now. On the one hand, it was a good business decision because Intel had just clamped down and decided not to relicense the P6 bus (not really called GTL+ BTW, but don't worry it's a very common mistake) to AMD, and rather than take the time to reinvent the wheel (and thus delay the launch of the K7), AMD decided to go shopping at Compaq. Fine. Smart decision.
Don't give me any of this revisionist history that they did it because it's 200 MHz, though. The K7's extra FSB bandwidth (courtesy of the EV6 bus (and the engineers at Compaq, not AMD)) has up till now been entirely wasted as it is paired with SDR SDRAM (1.6 GB/s FSB, only 1.06 GB/s from DRAM)--generally paired asynchronously with PC133. If it were any help at all, don't you think the Athlon would be winning and not losing in FSB-intensive benchmarks like Q3? Meanwhile, it's a huge waste of pins and power--as well it should be, since it was originally designed for $10,000-50,000 workstations and servers, which, frankly, can afford the extra mobo costs, power supplies and electric bills.
Now, of course this extra FSB bandwidth will finally be put to good use with the advent of DDR mobos for the K7, *finally* starting early next month (fingers crossed!); latest news is the 1st. DDR mobos for the P3 will show less improvement because the P3 is stuck at 133 MHz FSB. Fine.
But this isn't why AMD chose the EV6. Indeed, when they made that decision, the DDR standard had either barely-just-been or had-not-yet-been determined by JEDEC. Intel was set to steamroll RDRAM into every PC, and there was little to no indictation that DDR would ever be a volume part in the PC industry. (It'd be used in servers and such.) AMD chose the EV6 because they *had* to, not because they wanted to. It's a great bus when doing what it's designed for--connecting specially made (quite expensive) double-wide SDRAM to Alphas, at FSB freqencies up to 466 MHz. But it offers little to no performance benefit in the here and now for the K7. And as for DDR and high-speed buses, Intel will be releasing their Tualatin revision P3's in Q2 with a 200 MHz FSB, in time for Almador, their (maybe--legal issues with Rambus...) DDR P3 chipset. So yes, the K7 will be first with decent DDR support in the x86 space. The P4's dual-RDRAM chipset and 3.2 GB/s FSB will be faster, though more expensive, as far as memory performance goes, though.
But calling a bus that AMD had exactly zero nada zilch nothing to do with evidence of their design innovation gets an extra HAHAHAHAHAHA from me.
More on AMD's innovative EV6 bus: it's a huge boon for multiprocessing, with the 760MP coming to retail very soon according to Anand.
Unfortunately, according to AMD's Q3 earnings report Investor Conference Call 2 weeks ago (I was listening; somehow I doubt you were...), the 760 MP has been delayed to at least Q1, possibly Q2. They played it off as strategic reasons (business demand down; no major deals with the big 4 server OEMs (Dell, IBM, HP, Compaq) for AMD in the enterprise lines), but considering they only had one 2-way system--behind closed doors and not running anything--at MPF it looks as if their engineering is behind too. On the one hand, too bad, because point-to-point beats shared bus any day. On the other, there's a reason why Intel went with shared bus, and it's not because they'd never heard of PTP. It's, well, easier to implement. When doing the right thing takes over a year longer, it sometimes becomes doing the wrong thing. (Not that I believe that's true here, but it's worth taking into consideration.)
Now, the countryside is littered with Athlon clusters crunching numbers for the scientific community in places where they'd never have considered using a P!!!.
First off, scientific computing is such a niche market as to have absolutely negligable impact on the bottom line of either company. The idea that AMD designed the K7's huge-ass FPU--thus taking up vital die-space--for the lucrative physicist market is laughable. It's an unbalanced design, plain and simple. Second, last time I checked, most scientific computing was being done either on Alphas or on Beowulf's of Celerons. Now, I don't doubt that K7's are moving heavily into the mix; if I was doing scientific computing, I would go with a cluster of Durons in a heartbeat.
But do you really, honestly, think that when AMD decided to go with the 3-wide FPU there were dreams of meteorology and electron potential modeling spinning in their heads? Me either.
And your "analysis" of the supposed advantages of the Coppermine's cache over the Thunderbird's are positively laughable. You see, in the REAL WORLD people don't run benchmarks on their boxes all day. They run apps and processes, usually several at a time. That's why the Athlon's cache is superior--you can keep more in it instead of swapping to system RAM, which is a MUCH BIGGER HIT than having a small amount of cache latency. In the REAL WORLD, the Athlon's cache architecture makes sense, not in your fantasy where we all run CPUmark all day.
Uhhuh. That's why the Katmai P3--with its half-speed 512 Kb L2--was so much faster than the Coppermine? That's why the Athlon "Classic"--with down to 1/3-speed 512Kb L2--is so much better than TBird??
You think it's faster to perform a context switch with a 64-bit bus to L2 than a 256-bit one? Golly, imagine how slow the P4 with its 48 GB/s bus to the 5-cycle latency L2 will be!!
Furthermore, in case you'd forgotten, all these chips operate at over one billion cycles per second. Multitasking occurs at much higher granularity than this, and even if your analysis were right (it's not), the effects of multitasking are invisible to a chip to a second or third order of approximation. The effects of a 7 (or 5!) cycle L2 vs. an 11 cycle one most certainly are not.
And then you go on to say that the K8 will just be a derivative of the K7. You really aren't paying attention at all. The K8 Hammer architecture is completely new, not only extending x86 to true 64-bit while retaining backwards-compatibility with 32-bit and 16-bit code, but adding huge and significant architectural innovations. Go read about it, dimwit, before you guess at what it is. Lots of documentation has been released--even just a quick scan of some Slashdot search results will make you a lot more knowledgeable about it than you are now. Geez...
No, this is false. It is by now quite well known that what will define the Hammer family will be just a simple extension of the x86 ISA to 64 bits--an extension which will have essentially no use for the average PC user, but rather only for those who need 64-bit integer precision (CAD, etc.) or >32-bit memory address space (database, etc.). In addition, "the K8"--that is, the Sledgehammer, aimed at the enterprise market--will feature 2-way CMP and AMD's new Southbridge standard, LDT. Ho-hum. Nice features (LDT has no place in the PC either, though), but nothing extraordinary, especially considering it's not due until early 2002. In addition, there has been mention (Sanders mentioned it in an interview) of another K8 variant called Clawhammer; speculation is that this is a PC version of the K8, although it's not known what, other than x86-64, will differentiate it from, say, Palomino.
If you actually believe the K8 represents an entirely new design, then it may be that your news was correct but just a little (about 12 months) late. The K8 was indeed scheduled to be a ground-up clean-sheet kick-butt design, but was radically scaled back by Sanders less than a year ago. Head Designer and impressive guy Atiq Raza quit around a year ago, following this decision, and the hopes of a truly innovative K8 went with him. Of course, evolution is often better than revolution in the MPU industry--eg. RDRAM. If it can manage to position itself against Itanium, the K8 might look very strong. (Of course, McKinley will be on the way by then, and it's considerably less of a joke.) But claiming that the K8 is a revolutionary new design is plain false.
As for the recent /. articles on the K8, they have all been, IIRC, about the recently released x86-64 simulator to help Linux, etc. port to the new ISA. This has nothing whatsoever to do with the design of the K8 itself--an x86 simulator could help "port" Linux to either a P5 or a P6, although they could not be more different architecturally--just the ISA.
Re: P4 moving into heavy volume in Q3 2001: God, you are a shameless, and dim-witted, Intel apologist, just as I suspected. Q3 2001?
How precisely does this make me an Intel apologist? Is it too early?? It is a known fact that Intel's roadmap moves the P4 solidly into the mainstream category in Q3 '01 with the introduction of the Northwood P4 on a .13 Cu process. Northwood will allow Intel to get good yields at >2.0 GHz, and, more importantly, takes up a much more reasonable die space for mass production. Just as important, its release will coincide with the release of the (hopefully DDR) SDRAM Brookdale chipset, which ought to move the P4 out of the quite-high end where it will be stuck with the dual-RDRAM Tehama chipset. (3rd party DDR chipsets may be out for the P4 before then, but probably not in much volume before Q2 at the earliest.)
Is it too late?? The latest Intel roadmap shows the P4 moving to the upper end of the mainstream category in Q2, but I believe that to be a lie by Intel marketing, eager to cover up the fact that they essentially have no upper-mainstream product from now until Q3 2001, a hole in their product line a mile wide. (Am I still an Intel apologist?) Indeed, this is the reason I just bought AMD stock very recently, and have been encouraging my INTC-owning relatives to sell ever since, well as it turned out, just before the peak late this summer. And yes, like you too I am generally appalled by Intel's heavy-handed anti-consumer tactics--suing VIA and refusing to release a PC133 chipset in a lame attempt to force RDRAM down the industry's throat; paper launching the 1 GHz P3 6 months before even limited volume was available, the 700-850 MHz P3's before it around 3 months early, and the 1.13...oh the 1.13...all in a lame attempt to pretend the P6 could keep up with the Athlon; bribing Michael Dell with special pricing and all several dozen GHz P3s available this summer to spread libelous statements to the media in a lame attempt to disparage AMD's products; spreading IA-64 FUD in a lame though successful attempt to scare designers of competing RISC chips to delay (Compaq, HP) or eliminate (MIPS) their next-gen chips; keeping the Celeron FSB clocked at 66 MHz and "single-processor only" in a lame attempt to...be lame.
Don't worry, I dislike Intel plenty a lot. I cheer for AMD, and make no bones about it.
What bothers me, though, is that, having been on the Athlon bandwagon since summer 1999, when I first read analyses of how the K6's poor scaling was due to architecture not process quality, and how the better balanced K7 had the chance to scale even higher than the P3, I've seen how this position has gone from being contrarian, well-informed and far-sighted to the position of a growing mainstream of ill-informed buzzword-spouting reality-ignoring AMD fanboys. No, not you; the people I'm talking about are much much worse (and hence not nearly as able to fool /. with uninformed arguments). What's even worse, though, is that several influential tech sites employ writers not much more knowledgeable than you, and they spout the same pro-AMD propaganda day after day after day. It's not that I dislike seeing anything pro-AMD or anti-Intel; indeed, exactly the contrary. It's just that I like it to be true.
Plus, AMD's execution with the K7, while quite good, has been well short of the claims that I and many others were making for it over the past year. The benchmarks have been disappointing. There's only so much excitement you can get out of awesome benches in 3DSMax and ViewPerf before you notice those Q3 and Content Creation scores just aren't going to change. (Yes, I know CC is Intel-biased. Whatever.) Thunderbird in particular was a huge disappointment, offering gains on the order of 3-5% over Athlon Classic while the Coppermine P3 beat Katmai by 10-15% (it's that 64-bit vs. 256 bit L2 bus). MP has been MIA for months now. The K7 laptops are late as well; high power-consumption is the price you pay for unneeded FPUs.
Having read the Willamette articles I've now referred you to twice (the DeMone ones on RWT), having seen Paul defend his unorthodox position on the Ace's tech boards for months now, basically skewering even very well-informed arguments on the AMD side, I've gradually become convinced that the "web hardware community" is greatly underestimating the P4's performance. So have many people much more knowledgeable than me--including the formerly (and still, though less so, IMO) AMD-biased Johan.
I usually go around looking to argue with P4-bashers who seem intelligent and well-versed in the technology, because they give the most interesting arguments and are the most willing to learn. Unfortunately, I too often have to correct well-meaning but misleading posters like yourself, who ignorantly pass on the same-old wishful thinking and oversimple analysis as fact.
I like AMD. I really do. I want them to "win", inasmuch as I want anyone to. I really do want them to stay very very competitive, like they are now. (And to make me lots of money!) But I just don't think it's helping them, or helping the truth, to pretend that the K7, a largely derivative design, will be able to keep up with the radically innovative P4 for very long. And I don't think it's furthering the principles of beauty and elegence in design--which is what really interests me in this stuff anyways--to call an insightful and fair (I thought it strongly gave AMD the benefit of the doubt, BTW) analysis of the strengths and weaknesses of the P4 and Mustang designs "an ignorant fluffy rant", or whatever you said.
I won't expect the apology from you, but you have my email address if you should want to send it. Meanwhile, if you're really interested in MPU design, please read Paul's articles at RWT; they're fabulous and take everything to a whole new level. And if they must be anti-Intel, you can't do better than his Merced/Itanium articles, here, here, here and here.
Also you should check out the tech forum at Ace's, and the very AMD-biased but usually literate and often a great site for news and links...JC's. Plus the usual suspects: Tom's, Ars, The Register for juicy-and-occasionally-even-true rumors. You could learn a lot, and trust me, it's fascinating stuff. -
Re:You have way too much time on your hands, frien
First off, apologies for slipping in ad hominem attacks in my post. However, this was just in response to your similarly inappropriate attacks on Johan and Ace's. The difference, of course, is that my comments were in support of the correct analysis, not disparaging it.
This is not a site on which every little review and rant is meant to be posted. Hannibal's article belonged here because, well, Hannibal is an expert on the technology behind microprocessors.
1) It was Hemos' decision to post this; anyone can submit anything they deem worthwhile.
2) This was neither review or rant, but rather a lengthy and insightful look at some subtle but very important issues that will influence P4 vs. Mustang performance. Just because you've never seen anything on the web supportive of the P4 doesn't make a balanced piece a rant; it just means that you've been reading a lot of ignorant writing.
3) Humorously enough, the "self-promot ing" Hannibal link I offered was exactly "every little review", this time of some gimmicky portable (but monitor-less) PC. I found it entertaining, and was happy to see it on /., but it was the very definition of a fluff piece--like much of /., now that you mention it.
4) Hannibal is NOT an MPU expert. He himself will acknowlege this, and has in his articles (don't have time to find where). Email him yourself and ask him who is more of an expert, himself or Johan De Gelas, and I am relatively certain he'll say Johan. If not, he will readily admit that Johan is at least his equal and that Ace's is a much more technical site than Ars. And he will most certainly admit that Paul DeMone is 10 times the expert he is. Again, I really really like Hannibal's work, it honestly inspired me, and I submit every new Hannibal-on-architecture article to /. But he is just a student, not an expert.
And, just FYI, I have read every single article on microproccessor design that has passed by the /. pages for two years, plus linkage from several other sites, and a few print articles (though I no longer like to touch paper. How primitive...). I could easily look up links, hell, just by using the search features on /. and Anand.
ROFLMAO!
You read the scant handful of poorly chosen architecture articles linked from slashdot and you consider yourself an expert??? HAHAHAHAHAHA. Oh--and sometimes you check your facts with little old 16-year old Anand.
Look dude, it isn't my place to criticize you for not knowing as much about MPU design as I do. It is my place to criticize you for not realizing that there is much more to be known, for not realizing that many people do know more about it than you. I am certainly no expert--I'm just a college student--but it is blindingly clear that I know more about it than you, just as it is clear to most /.ers that they know more about computers than, say, the guy who says he needs to go out and buy more "RAMs" because the new game he just bought says it requires 250MB of free space to install. (Don't worry--you're not that bad, it was just an analogy. ;)
Second it's quite clear that you essentially skipped all the parts of the article you didn't understand and concluded that if you--with your expert education on MPU design from /. and Anandtech--didn't know what was going on, it must be "IYNSHO, fluff masquerading as technical writing". Unfortunately, your opinion, humble or no, does not apply here: it is indeed a fact that this piece contained several new insights, and synthesized information which was not easily available in other forms. This may not meet your standards of being "more than a book report," but it is certainly meets those of technical writing. Obviously Johan could not hope to benchmark the new P4 or Mustang cores, as they are not released yet; still he managed to include some insightful benchmarks which demonstrate the points he was abstractly discussing with ample clarity. (Of course, if you're used to looking at MPUs as mysterious black boxes, then you might wonder what rehashed K6 benchmarks are doing in a Mustang/P4 article.) If you truly believe that this article included "nothing unique", why don't you post just one article detailing the issues I raised in my previous post? Since you've obviously read such an article yourself, MPU expert that you are, it shouldn't be too difficult to dig up a link, even without resorting to "the search features on Anand." (LOL!)
No, Johan didn't take what might be called the "Hannibal route"--i.e. launch into an exploration of the overall design philosophies behind the two cores--because he is writing for a specific audience, a knowledgeable technical audience who can be expected to have read several pieces explaining the important design features of the P4 (not much concrete is known about the Mustang other than that it will be a K7 with tweaked layout to improve critical path and power consumption, and that it may recieve several other enhancements as speculated in the article), specifically those here, here, here, here, and here. Not only have most regular readers of Ace's read all these articles, but they have followed some very interesting debates on them between industry experts on the Ace's tech forum for months now. It might be fair to criticize Johan for submitting an article which clearly assumed such a technical background to /. (although in fairness he includes a link to his earlier, more general P4 article in the very first sentence); of course, it's /. who decides what to post on their own site, not Johan.
Re: preproduction benchmarks, the Tom's piece on the PII and the Firingsquad piece on the K7 were generally the only benchmarks available of the respective chips before their launch. If you followed MPU design news as closely as I do you would know this. There is a thing called an NDA, after all; as these two pieces demonstrate, both Intel and AMD like to make sure that those who choose to break theirs post erroneous information.
You're of course right that a PPro was indeed superior to a PII at a given clock speed; if you look through the article itself instead of just relying on the concluding quotes I posted, you would find benchmarks which clearly understate the known performance of the PII by as much as 30 or 40%, though. There is no doubt Tom's preproduction benchmarks, like Firingsquad's, were horribly off. And as long as you're disputing my "always" contention--I've ponied up the links (and no, it didn't take me very long at all, because I, having followed MPU developments for a couple years, knew for example that it was FS with the bad preproduction K7 benchmarks, and Tom with the PII controversy); why don't you post a single pre-NDA "review" or even just a series of leaked benchmarks on a new x86 core which proved entirely accurate?
Re: definition of a 7th-gen core: You really should pay more attention in class, boy, because I'm schooling you right now. I explained what I meant right after that comment in the original post: The Athlon is a new core designed to scale well to very high clockspeeds. Just like the Willammette. That's why the Willamette performs slower clock-for-clock than a theoreticl P!!! at the same clockspeed
First off, there is no evidence that the P4 has lower IPC than the P3, except for preproduction benchmarks and some ambiguous comments from Intel VPs. If you read my previous post at all, you would realize these would tend to indicate that the P4 actually has higher IPC, not lower. On the other hand, the main evidence that it has higher IPC is that an analysis of all the new, innovative, braniac features of the core strongly indicate that it must.
And second off, you couldn't be more wrong. By calling the K7 a "7th-gen core" you are obviously comparing it to the 6 previous generations of Intel cores. Each of them was able to improve both clock speed on identical process and IPC significantly over the previous generation. The Athlon beats the P6 in clock speed on identical process...but only narrowly: the Athlon sweet spot right now is around 1 GHz on Dresden's .18 Cu process; the P6 around 750 MHz on a .18 Al process. Intel's process is probably slightly better except for the large Cu vs. Al gap, so we can be charitable and say that, on identical processes, the K7 clocks 25% faster in an untweaked core than the P6 does in a much-tweaked core. Indications are that the Mustang/Palomino/Morgan K7 tweak will reach 1.5 GHz on .18 Cu, so perhaps 35-40% better on equivalent processes. As for IPC, the P6 and K7 are essentially equal. Indeed, this is being generous to the K7, as the P6 knocks it all over the place in the fairest cross-platform bench there is, SPEC. Yes, this is because Intel's in-house compiler group is better than AMD's...but the compiler is arguably just as much a part of a core as the silicon itself.
Meanwhile, P4 roadmaps indicate that it will scale 100% better than the P6 on identical processes, and the analysis of Paul DeMone, a far greater MPU expert than you or I could hope to be, is that it will have 15-20% better IPC for integer work, and considerably greater gains for FP. (It's too soon to tell without knowing more about how well compilers will optimize for SSE2.) That would be a 7th-gen core worthy of the leap from 5th to 6th which the P6 provided.
Again, don't get me wrong: the Athlon is clearly clearly superior to the Coppermine P3. But only by about the same degree as the Coppermine P3 was superior to the Katmai P3. That is, *not* by a full "generation"--whatever the hell that is.
Re: important innovations in the K7: It's a huge innovation in the x86 world, something Intel hasn't done since the PPro days. First of all, the EV6 bus is new to x86 and a huge innovation, it is superior in every way to the old Intel GTL+.
BWAA HAA HAA HAA HAA HAA!! Man, I'm rolling on the floor and crying that's so pitiful.
Oh, but I'm being rude. Ahem. Pardon me. You, uh...you do know why it's called the EV6 bus, don't you? ...Even though that name happens to be shared by the current generation of Compaq Alpha MPUs... Or wait; actually the official name of the current Alphas chips is 21264; it's just that they like to code name their core variants things like "EV6" and "EV67" and the upcoming "EV68". Based on what, again? What's that? Based on the code name of the current Alpha platform???
HAA HAA HAA HAA HAA! I asked you to pick one innovative feature of the K7, and you picked the one feature that AMD DIDN'T INVENT!!!!!
Ok, I'm over it now. Phew.
Right. AMD didn't invent the EV6 bus. They didn't help develop it. They in fact had nothing to do with it. They licensed it, wholesale, lock, stock, barrel, from Compaq where it has been in use for quite some time now. On the one hand, it was a good business decision because Intel had just clamped down and decided not to relicense the P6 bus (not really called GTL+ BTW, but don't worry it's a very common mistake) to AMD, and rather than take the time to reinvent the wheel (and thus delay the launch of the K7), AMD decided to go shopping at Compaq. Fine. Smart decision.
Don't give me any of this revisionist history that they did it because it's 200 MHz, though. The K7's extra FSB bandwidth (courtesy of the EV6 bus (and the engineers at Compaq, not AMD)) has up till now been entirely wasted as it is paired with SDR SDRAM (1.6 GB/s FSB, only 1.06 GB/s from DRAM)--generally paired asynchronously with PC133. If it were any help at all, don't you think the Athlon would be winning and not losing in FSB-intensive benchmarks like Q3? Meanwhile, it's a huge waste of pins and power--as well it should be, since it was originally designed for $10,000-50,000 workstations and servers, which, frankly, can afford the extra mobo costs, power supplies and electric bills.
Now, of course this extra FSB bandwidth will finally be put to good use with the advent of DDR mobos for the K7, *finally* starting early next month (fingers crossed!); latest news is the 1st. DDR mobos for the P3 will show less improvement because the P3 is stuck at 133 MHz FSB. Fine.
But this isn't why AMD chose the EV6. Indeed, when they made that decision, the DDR standard had either barely-just-been or had-not-yet-been determined by JEDEC. Intel was set to steamroll RDRAM into every PC, and there was little to no indictation that DDR would ever be a volume part in the PC industry. (It'd be used in servers and such.) AMD chose the EV6 because they *had* to, not because they wanted to. It's a great bus when doing what it's designed for--connecting specially made (quite expensive) double-wide SDRAM to Alphas, at FSB freqencies up to 466 MHz. But it offers little to no performance benefit in the here and now for the K7. And as for DDR and high-speed buses, Intel will be releasing their Tualatin revision P3's in Q2 with a 200 MHz FSB, in time for Almador, their (maybe--legal issues with Rambus...) DDR P3 chipset. So yes, the K7 will be first with decent DDR support in the x86 space. The P4's dual-RDRAM chipset and 3.2 GB/s FSB will be faster, though more expensive, as far as memory performance goes, though.
But calling a bus that AMD had exactly zero nada zilch nothing to do with evidence of their design innovation gets an extra HAHAHAHAHAHA from me.
More on AMD's innovative EV6 bus: it's a huge boon for multiprocessing, with the 760MP coming to retail very soon according to Anand.
Unfortunately, according to AMD's Q3 earnings report Investor Conference Call 2 weeks ago (I was listening; somehow I doubt you were...), the 760 MP has been delayed to at least Q1, possibly Q2. They played it off as strategic reasons (business demand down; no major deals with the big 4 server OEMs (Dell, IBM, HP, Compaq) for AMD in the enterprise lines), but considering they only had one 2-way system--behind closed doors and not running anything--at MPF it looks as if their engineering is behind too. On the one hand, too bad, because point-to-point beats shared bus any day. On the other, there's a reason why Intel went with shared bus, and it's not because they'd never heard of PTP. It's, well, easier to implement. When doing the right thing takes over a year longer, it sometimes becomes doing the wrong thing. (Not that I believe that's true here, but it's worth taking into consideration.)
Now, the countryside is littered with Athlon clusters crunching numbers for the scientific community in places where they'd never have considered using a P!!!.
First off, scientific computing is such a niche market as to have absolutely negligable impact on the bottom line of either company. The idea that AMD designed the K7's huge-ass FPU--thus taking up vital die-space--for the lucrative physicist market is laughable. It's an unbalanced design, plain and simple. Second, last time I checked, most scientific computing was being done either on Alphas or on Beowulf's of Celerons. Now, I don't doubt that K7's are moving heavily into the mix; if I was doing scientific computing, I would go with a cluster of Durons in a heartbeat.
But do you really, honestly, think that when AMD decided to go with the 3-wide FPU there were dreams of meteorology and electron potential modeling spinning in their heads? Me either.
And your "analysis" of the supposed advantages of the Coppermine's cache over the Thunderbird's are positively laughable. You see, in the REAL WORLD people don't run benchmarks on their boxes all day. They run apps and processes, usually several at a time. That's why the Athlon's cache is superior--you can keep more in it instead of swapping to system RAM, which is a MUCH BIGGER HIT than having a small amount of cache latency. In the REAL WORLD, the Athlon's cache architecture makes sense, not in your fantasy where we all run CPUmark all day.
Uhhuh. That's why the Katmai P3--with its half-speed 512 Kb L2--was so much faster than the Coppermine? That's why the Athlon "Classic"--with down to 1/3-speed 512Kb L2--is so much better than TBird??
You think it's faster to perform a context switch with a 64-bit bus to L2 than a 256-bit one? Golly, imagine how slow the P4 with its 48 GB/s bus to the 5-cycle latency L2 will be!!
Furthermore, in case you'd forgotten, all these chips operate at over one billion cycles per second. Multitasking occurs at much higher granularity than this, and even if your analysis were right (it's not), the effects of multitasking are invisible to a chip to a second or third order of approximation. The effects of a 7 (or 5!) cycle L2 vs. an 11 cycle one most certainly are not.
And then you go on to say that the K8 will just be a derivative of the K7. You really aren't paying attention at all. The K8 Hammer architecture is completely new, not only extending x86 to true 64-bit while retaining backwards-compatibility with 32-bit and 16-bit code, but adding huge and significant architectural innovations. Go read about it, dimwit, before you guess at what it is. Lots of documentation has been released--even just a quick scan of some Slashdot search results will make you a lot more knowledgeable about it than you are now. Geez...
No, this is false. It is by now quite well known that what will define the Hammer family will be just a simple extension of the x86 ISA to 64 bits--an extension which will have essentially no use for the average PC user, but rather only for those who need 64-bit integer precision (CAD, etc.) or >32-bit memory address space (database, etc.). In addition, "the K8"--that is, the Sledgehammer, aimed at the enterprise market--will feature 2-way CMP and AMD's new Southbridge standard, LDT. Ho-hum. Nice features (LDT has no place in the PC either, though), but nothing extraordinary, especially considering it's not due until early 2002. In addition, there has been mention (Sanders mentioned it in an interview) of another K8 variant called Clawhammer; speculation is that this is a PC version of the K8, although it's not known what, other than x86-64, will differentiate it from, say, Palomino.
If you actually believe the K8 represents an entirely new design, then it may be that your news was correct but just a little (about 12 months) late. The K8 was indeed scheduled to be a ground-up clean-sheet kick-butt design, but was radically scaled back by Sanders less than a year ago. Head Designer and impressive guy Atiq Raza quit around a year ago, following this decision, and the hopes of a truly innovative K8 went with him. Of course, evolution is often better than revolution in the MPU industry--eg. RDRAM. If it can manage to position itself against Itanium, the K8 might look very strong. (Of course, McKinley will be on the way by then, and it's considerably less of a joke.) But claiming that the K8 is a revolutionary new design is plain false.
As for the recent /. articles on the K8, they have all been, IIRC, about the recently released x86-64 simulator to help Linux, etc. port to the new ISA. This has nothing whatsoever to do with the design of the K8 itself--an x86 simulator could help "port" Linux to either a P5 or a P6, although they could not be more different architecturally--just the ISA.
Re: P4 moving into heavy volume in Q3 2001: God, you are a shameless, and dim-witted, Intel apologist, just as I suspected. Q3 2001?
How precisely does this make me an Intel apologist? Is it too early?? It is a known fact that Intel's roadmap moves the P4 solidly into the mainstream category in Q3 '01 with the introduction of the Northwood P4 on a .13 Cu process. Northwood will allow Intel to get good yields at >2.0 GHz, and, more importantly, takes up a much more reasonable die space for mass production. Just as important, its release will coincide with the release of the (hopefully DDR) SDRAM Brookdale chipset, which ought to move the P4 out of the quite-high end where it will be stuck with the dual-RDRAM Tehama chipset. (3rd party DDR chipsets may be out for the P4 before then, but probably not in much volume before Q2 at the earliest.)
Is it too late?? The latest Intel roadmap shows the P4 moving to the upper end of the mainstream category in Q2, but I believe that to be a lie by Intel marketing, eager to cover up the fact that they essentially have no upper-mainstream product from now until Q3 2001, a hole in their product line a mile wide. (Am I still an Intel apologist?) Indeed, this is the reason I just bought AMD stock very recently, and have been encouraging my INTC-owning relatives to sell ever since, well as it turned out, just before the peak late this summer. And yes, like you too I am generally appalled by Intel's heavy-handed anti-consumer tactics--suing VIA and refusing to release a PC133 chipset in a lame attempt to force RDRAM down the industry's throat; paper launching the 1 GHz P3 6 months before even limited volume was available, the 700-850 MHz P3's before it around 3 months early, and the 1.13...oh the 1.13...all in a lame attempt to pretend the P6 could keep up with the Athlon; bribing Michael Dell with special pricing and all several dozen GHz P3s available this summer to spread libelous statements to the media in a lame attempt to disparage AMD's products; spreading IA-64 FUD in a lame though successful attempt to scare designers of competing RISC chips to delay (Compaq, HP) or eliminate (MIPS) their next-gen chips; keeping the Celeron FSB clocked at 66 MHz and "single-processor only" in a lame attempt to...be lame.
Don't worry, I dislike Intel plenty a lot. I cheer for AMD, and make no bones about it.
What bothers me, though, is that, having been on the Athlon bandwagon since summer 1999, when I first read analyses of how the K6's poor scaling was due to architecture not process quality, and how the better balanced K7 had the chance to scale even higher than the P3, I've seen how this position has gone from being contrarian, well-informed and far-sighted to the position of a growing mainstream of ill-informed buzzword-spouting reality-ignoring AMD fanboys. No, not you; the people I'm talking about are much much worse (and hence not nearly as able to fool /. with uninformed arguments). What's even worse, though, is that several influential tech sites employ writers not much more knowledgeable than you, and they spout the same pro-AMD propaganda day after day after day. It's not that I dislike seeing anything pro-AMD or anti-Intel; indeed, exactly the contrary. It's just that I like it to be true.
Plus, AMD's execution with the K7, while quite good, has been well short of the claims that I and many others were making for it over the past year. The benchmarks have been disappointing. There's only so much excitement you can get out of awesome benches in 3DSMax and ViewPerf before you notice those Q3 and Content Creation scores just aren't going to change. (Yes, I know CC is Intel-biased. Whatever.) Thunderbird in particular was a huge disappointment, offering gains on the order of 3-5% over Athlon Classic while the Coppermine P3 beat Katmai by 10-15% (it's that 64-bit vs. 256 bit L2 bus). MP has been MIA for months now. The K7 laptops are late as well; high power-consumption is the price you pay for unneeded FPUs.
Having read the Willamette articles I've now referred you to twice (the DeMone ones on RWT), having seen Paul defend his unorthodox position on the Ace's tech boards for months now, basically skewering even very well-informed arguments on the AMD side, I've gradually become convinced that the "web hardware community" is greatly underestimating the P4's performance. So have many people much more knowledgeable than me--including the formerly (and still, though less so, IMO) AMD-biased Johan.
I usually go around looking to argue with P4-bashers who seem intelligent and well-versed in the technology, because they give the most interesting arguments and are the most willing to learn. Unfortunately, I too often have to correct well-meaning but misleading posters like yourself, who ignorantly pass on the same-old wishful thinking and oversimple analysis as fact.
I like AMD. I really do. I want them to "win", inasmuch as I want anyone to. I really do want them to stay very very competitive, like they are now. (And to make me lots of money!) But I just don't think it's helping them, or helping the truth, to pretend that the K7, a largely derivative design, will be able to keep up with the radically innovative P4 for very long. And I don't think it's furthering the principles of beauty and elegence in design--which is what really interests me in this stuff anyways--to call an insightful and fair (I thought it strongly gave AMD the benefit of the doubt, BTW) analysis of the strengths and weaknesses of the P4 and Mustang designs "an ignorant fluffy rant", or whatever you said.
I won't expect the apology from you, but you have my email address if you should want to send it. Meanwhile, if you're really interested in MPU design, please read Paul's articles at RWT; they're fabulous and take everything to a whole new level. And if they must be anti-Intel, you can't do better than his Merced/Itanium articles, here, here, here and here.
Also you should check out the tech forum at Ace's, and the very AMD-biased but usually literate and often a great site for news and links...JC's. Plus the usual suspects: Tom's, Ars, The Register for juicy-and-occasionally-even-true rumors. You could learn a lot, and trust me, it's fascinating stuff. -
Re:You have way too much time on your hands, frien
First off, apologies for slipping in ad hominem attacks in my post. However, this was just in response to your similarly inappropriate attacks on Johan and Ace's. The difference, of course, is that my comments were in support of the correct analysis, not disparaging it.
This is not a site on which every little review and rant is meant to be posted. Hannibal's article belonged here because, well, Hannibal is an expert on the technology behind microprocessors.
1) It was Hemos' decision to post this; anyone can submit anything they deem worthwhile.
2) This was neither review or rant, but rather a lengthy and insightful look at some subtle but very important issues that will influence P4 vs. Mustang performance. Just because you've never seen anything on the web supportive of the P4 doesn't make a balanced piece a rant; it just means that you've been reading a lot of ignorant writing.
3) Humorously enough, the "self-promot ing" Hannibal link I offered was exactly "every little review", this time of some gimmicky portable (but monitor-less) PC. I found it entertaining, and was happy to see it on /., but it was the very definition of a fluff piece--like much of /., now that you mention it.
4) Hannibal is NOT an MPU expert. He himself will acknowlege this, and has in his articles (don't have time to find where). Email him yourself and ask him who is more of an expert, himself or Johan De Gelas, and I am relatively certain he'll say Johan. If not, he will readily admit that Johan is at least his equal and that Ace's is a much more technical site than Ars. And he will most certainly admit that Paul DeMone is 10 times the expert he is. Again, I really really like Hannibal's work, it honestly inspired me, and I submit every new Hannibal-on-architecture article to /. But he is just a student, not an expert.
And, just FYI, I have read every single article on microproccessor design that has passed by the /. pages for two years, plus linkage from several other sites, and a few print articles (though I no longer like to touch paper. How primitive...). I could easily look up links, hell, just by using the search features on /. and Anand.
ROFLMAO!
You read the scant handful of poorly chosen architecture articles linked from slashdot and you consider yourself an expert??? HAHAHAHAHAHA. Oh--and sometimes you check your facts with little old 16-year old Anand.
Look dude, it isn't my place to criticize you for not knowing as much about MPU design as I do. It is my place to criticize you for not realizing that there is much more to be known, for not realizing that many people do know more about it than you. I am certainly no expert--I'm just a college student--but it is blindingly clear that I know more about it than you, just as it is clear to most /.ers that they know more about computers than, say, the guy who says he needs to go out and buy more "RAMs" because the new game he just bought says it requires 250MB of free space to install. (Don't worry--you're not that bad, it was just an analogy. ;)
Second it's quite clear that you essentially skipped all the parts of the article you didn't understand and concluded that if you--with your expert education on MPU design from /. and Anandtech--didn't know what was going on, it must be "IYNSHO, fluff masquerading as technical writing". Unfortunately, your opinion, humble or no, does not apply here: it is indeed a fact that this piece contained several new insights, and synthesized information which was not easily available in other forms. This may not meet your standards of being "more than a book report," but it is certainly meets those of technical writing. Obviously Johan could not hope to benchmark the new P4 or Mustang cores, as they are not released yet; still he managed to include some insightful benchmarks which demonstrate the points he was abstractly discussing with ample clarity. (Of course, if you're used to looking at MPUs as mysterious black boxes, then you might wonder what rehashed K6 benchmarks are doing in a Mustang/P4 article.) If you truly believe that this article included "nothing unique", why don't you post just one article detailing the issues I raised in my previous post? Since you've obviously read such an article yourself, MPU expert that you are, it shouldn't be too difficult to dig up a link, even without resorting to "the search features on Anand." (LOL!)
No, Johan didn't take what might be called the "Hannibal route"--i.e. launch into an exploration of the overall design philosophies behind the two cores--because he is writing for a specific audience, a knowledgeable technical audience who can be expected to have read several pieces explaining the important design features of the P4 (not much concrete is known about the Mustang other than that it will be a K7 with tweaked layout to improve critical path and power consumption, and that it may recieve several other enhancements as speculated in the article), specifically those here, here, here, here, and here. Not only have most regular readers of Ace's read all these articles, but they have followed some very interesting debates on them between industry experts on the Ace's tech forum for months now. It might be fair to criticize Johan for submitting an article which clearly assumed such a technical background to /. (although in fairness he includes a link to his earlier, more general P4 article in the very first sentence); of course, it's /. who decides what to post on their own site, not Johan.
Re: preproduction benchmarks, the Tom's piece on the PII and the Firingsquad piece on the K7 were generally the only benchmarks available of the respective chips before their launch. If you followed MPU design news as closely as I do you would know this. There is a thing called an NDA, after all; as these two pieces demonstrate, both Intel and AMD like to make sure that those who choose to break theirs post erroneous information.
You're of course right that a PPro was indeed superior to a PII at a given clock speed; if you look through the article itself instead of just relying on the concluding quotes I posted, you would find benchmarks which clearly understate the known performance of the PII by as much as 30 or 40%, though. There is no doubt Tom's preproduction benchmarks, like Firingsquad's, were horribly off. And as long as you're disputing my "always" contention--I've ponied up the links (and no, it didn't take me very long at all, because I, having followed MPU developments for a couple years, knew for example that it was FS with the bad preproduction K7 benchmarks, and Tom with the PII controversy); why don't you post a single pre-NDA "review" or even just a series of leaked benchmarks on a new x86 core which proved entirely accurate?
Re: definition of a 7th-gen core: You really should pay more attention in class, boy, because I'm schooling you right now. I explained what I meant right after that comment in the original post: The Athlon is a new core designed to scale well to very high clockspeeds. Just like the Willammette. That's why the Willamette performs slower clock-for-clock than a theoreticl P!!! at the same clockspeed
First off, there is no evidence that the P4 has lower IPC than the P3, except for preproduction benchmarks and some ambiguous comments from Intel VPs. If you read my previous post at all, you would realize these would tend to indicate that the P4 actually has higher IPC, not lower. On the other hand, the main evidence that it has higher IPC is that an analysis of all the new, innovative, braniac features of the core strongly indicate that it must.
And second off, you couldn't be more wrong. By calling the K7 a "7th-gen core" you are obviously comparing it to the 6 previous generations of Intel cores. Each of them was able to improve both clock speed on identical process and IPC significantly over the previous generation. The Athlon beats the P6 in clock speed on identical process...but only narrowly: the Athlon sweet spot right now is around 1 GHz on Dresden's .18 Cu process; the P6 around 750 MHz on a .18 Al process. Intel's process is probably slightly better except for the large Cu vs. Al gap, so we can be charitable and say that, on identical processes, the K7 clocks 25% faster in an untweaked core than the P6 does in a much-tweaked core. Indications are that the Mustang/Palomino/Morgan K7 tweak will reach 1.5 GHz on .18 Cu, so perhaps 35-40% better on equivalent processes. As for IPC, the P6 and K7 are essentially equal. Indeed, this is being generous to the K7, as the P6 knocks it all over the place in the fairest cross-platform bench there is, SPEC. Yes, this is because Intel's in-house compiler group is better than AMD's...but the compiler is arguably just as much a part of a core as the silicon itself.
Meanwhile, P4 roadmaps indicate that it will scale 100% better than the P6 on identical processes, and the analysis of Paul DeMone, a far greater MPU expert than you or I could hope to be, is that it will have 15-20% better IPC for integer work, and considerably greater gains for FP. (It's too soon to tell without knowing more about how well compilers will optimize for SSE2.) That would be a 7th-gen core worthy of the leap from 5th to 6th which the P6 provided.
Again, don't get me wrong: the Athlon is clearly clearly superior to the Coppermine P3. But only by about the same degree as the Coppermine P3 was superior to the Katmai P3. That is, *not* by a full "generation"--whatever the hell that is.
Re: important innovations in the K7: It's a huge innovation in the x86 world, something Intel hasn't done since the PPro days. First of all, the EV6 bus is new to x86 and a huge innovation, it is superior in every way to the old Intel GTL+.
BWAA HAA HAA HAA HAA HAA!! Man, I'm rolling on the floor and crying that's so pitiful.
Oh, but I'm being rude. Ahem. Pardon me. You, uh...you do know why it's called the EV6 bus, don't you? ...Even though that name happens to be shared by the current generation of Compaq Alpha MPUs... Or wait; actually the official name of the current Alphas chips is 21264; it's just that they like to code name their core variants things like "EV6" and "EV67" and the upcoming "EV68". Based on what, again? What's that? Based on the code name of the current Alpha platform???
HAA HAA HAA HAA HAA! I asked you to pick one innovative feature of the K7, and you picked the one feature that AMD DIDN'T INVENT!!!!!
Ok, I'm over it now. Phew.
Right. AMD didn't invent the EV6 bus. They didn't help develop it. They in fact had nothing to do with it. They licensed it, wholesale, lock, stock, barrel, from Compaq where it has been in use for quite some time now. On the one hand, it was a good business decision because Intel had just clamped down and decided not to relicense the P6 bus (not really called GTL+ BTW, but don't worry it's a very common mistake) to AMD, and rather than take the time to reinvent the wheel (and thus delay the launch of the K7), AMD decided to go shopping at Compaq. Fine. Smart decision.
Don't give me any of this revisionist history that they did it because it's 200 MHz, though. The K7's extra FSB bandwidth (courtesy of the EV6 bus (and the engineers at Compaq, not AMD)) has up till now been entirely wasted as it is paired with SDR SDRAM (1.6 GB/s FSB, only 1.06 GB/s from DRAM)--generally paired asynchronously with PC133. If it were any help at all, don't you think the Athlon would be winning and not losing in FSB-intensive benchmarks like Q3? Meanwhile, it's a huge waste of pins and power--as well it should be, since it was originally designed for $10,000-50,000 workstations and servers, which, frankly, can afford the extra mobo costs, power supplies and electric bills.
Now, of course this extra FSB bandwidth will finally be put to good use with the advent of DDR mobos for the K7, *finally* starting early next month (fingers crossed!); latest news is the 1st. DDR mobos for the P3 will show less improvement because the P3 is stuck at 133 MHz FSB. Fine.
But this isn't why AMD chose the EV6. Indeed, when they made that decision, the DDR standard had either barely-just-been or had-not-yet-been determined by JEDEC. Intel was set to steamroll RDRAM into every PC, and there was little to no indictation that DDR would ever be a volume part in the PC industry. (It'd be used in servers and such.) AMD chose the EV6 because they *had* to, not because they wanted to. It's a great bus when doing what it's designed for--connecting specially made (quite expensive) double-wide SDRAM to Alphas, at FSB freqencies up to 466 MHz. But it offers little to no performance benefit in the here and now for the K7. And as for DDR and high-speed buses, Intel will be releasing their Tualatin revision P3's in Q2 with a 200 MHz FSB, in time for Almador, their (maybe--legal issues with Rambus...) DDR P3 chipset. So yes, the K7 will be first with decent DDR support in the x86 space. The P4's dual-RDRAM chipset and 3.2 GB/s FSB will be faster, though more expensive, as far as memory performance goes, though.
But calling a bus that AMD had exactly zero nada zilch nothing to do with evidence of their design innovation gets an extra HAHAHAHAHAHA from me.
More on AMD's innovative EV6 bus: it's a huge boon for multiprocessing, with the 760MP coming to retail very soon according to Anand.
Unfortunately, according to AMD's Q3 earnings report Investor Conference Call 2 weeks ago (I was listening; somehow I doubt you were...), the 760 MP has been delayed to at least Q1, possibly Q2. They played it off as strategic reasons (business demand down; no major deals with the big 4 server OEMs (Dell, IBM, HP, Compaq) for AMD in the enterprise lines), but considering they only had one 2-way system--behind closed doors and not running anything--at MPF it looks as if their engineering is behind too. On the one hand, too bad, because point-to-point beats shared bus any day. On the other, there's a reason why Intel went with shared bus, and it's not because they'd never heard of PTP. It's, well, easier to implement. When doing the right thing takes over a year longer, it sometimes becomes doing the wrong thing. (Not that I believe that's true here, but it's worth taking into consideration.)
Now, the countryside is littered with Athlon clusters crunching numbers for the scientific community in places where they'd never have considered using a P!!!.
First off, scientific computing is such a niche market as to have absolutely negligable impact on the bottom line of either company. The idea that AMD designed the K7's huge-ass FPU--thus taking up vital die-space--for the lucrative physicist market is laughable. It's an unbalanced design, plain and simple. Second, last time I checked, most scientific computing was being done either on Alphas or on Beowulf's of Celerons. Now, I don't doubt that K7's are moving heavily into the mix; if I was doing scientific computing, I would go with a cluster of Durons in a heartbeat.
But do you really, honestly, think that when AMD decided to go with the 3-wide FPU there were dreams of meteorology and electron potential modeling spinning in their heads? Me either.
And your "analysis" of the supposed advantages of the Coppermine's cache over the Thunderbird's are positively laughable. You see, in the REAL WORLD people don't run benchmarks on their boxes all day. They run apps and processes, usually several at a time. That's why the Athlon's cache is superior--you can keep more in it instead of swapping to system RAM, which is a MUCH BIGGER HIT than having a small amount of cache latency. In the REAL WORLD, the Athlon's cache architecture makes sense, not in your fantasy where we all run CPUmark all day.
Uhhuh. That's why the Katmai P3--with its half-speed 512 Kb L2--was so much faster than the Coppermine? That's why the Athlon "Classic"--with down to 1/3-speed 512Kb L2--is so much better than TBird??
You think it's faster to perform a context switch with a 64-bit bus to L2 than a 256-bit one? Golly, imagine how slow the P4 with its 48 GB/s bus to the 5-cycle latency L2 will be!!
Furthermore, in case you'd forgotten, all these chips operate at over one billion cycles per second. Multitasking occurs at much higher granularity than this, and even if your analysis were right (it's not), the effects of multitasking are invisible to a chip to a second or third order of approximation. The effects of a 7 (or 5!) cycle L2 vs. an 11 cycle one most certainly are not.
And then you go on to say that the K8 will just be a derivative of the K7. You really aren't paying attention at all. The K8 Hammer architecture is completely new, not only extending x86 to true 64-bit while retaining backwards-compatibility with 32-bit and 16-bit code, but adding huge and significant architectural innovations. Go read about it, dimwit, before you guess at what it is. Lots of documentation has been released--even just a quick scan of some Slashdot search results will make you a lot more knowledgeable about it than you are now. Geez...
No, this is false. It is by now quite well known that what will define the Hammer family will be just a simple extension of the x86 ISA to 64 bits--an extension which will have essentially no use for the average PC user, but rather only for those who need 64-bit integer precision (CAD, etc.) or >32-bit memory address space (database, etc.). In addition, "the K8"--that is, the Sledgehammer, aimed at the enterprise market--will feature 2-way CMP and AMD's new Southbridge standard, LDT. Ho-hum. Nice features (LDT has no place in the PC either, though), but nothing extraordinary, especially considering it's not due until early 2002. In addition, there has been mention (Sanders mentioned it in an interview) of another K8 variant called Clawhammer; speculation is that this is a PC version of the K8, although it's not known what, other than x86-64, will differentiate it from, say, Palomino.
If you actually believe the K8 represents an entirely new design, then it may be that your news was correct but just a little (about 12 months) late. The K8 was indeed scheduled to be a ground-up clean-sheet kick-butt design, but was radically scaled back by Sanders less than a year ago. Head Designer and impressive guy Atiq Raza quit around a year ago, following this decision, and the hopes of a truly innovative K8 went with him. Of course, evolution is often better than revolution in the MPU industry--eg. RDRAM. If it can manage to position itself against Itanium, the K8 might look very strong. (Of course, McKinley will be on the way by then, and it's considerably less of a joke.) But claiming that the K8 is a revolutionary new design is plain false.
As for the recent /. articles on the K8, they have all been, IIRC, about the recently released x86-64 simulator to help Linux, etc. port to the new ISA. This has nothing whatsoever to do with the design of the K8 itself--an x86 simulator could help "port" Linux to either a P5 or a P6, although they could not be more different architecturally--just the ISA.
Re: P4 moving into heavy volume in Q3 2001: God, you are a shameless, and dim-witted, Intel apologist, just as I suspected. Q3 2001?
How precisely does this make me an Intel apologist? Is it too early?? It is a known fact that Intel's roadmap moves the P4 solidly into the mainstream category in Q3 '01 with the introduction of the Northwood P4 on a .13 Cu process. Northwood will allow Intel to get good yields at >2.0 GHz, and, more importantly, takes up a much more reasonable die space for mass production. Just as important, its release will coincide with the release of the (hopefully DDR) SDRAM Brookdale chipset, which ought to move the P4 out of the quite-high end where it will be stuck with the dual-RDRAM Tehama chipset. (3rd party DDR chipsets may be out for the P4 before then, but probably not in much volume before Q2 at the earliest.)
Is it too late?? The latest Intel roadmap shows the P4 moving to the upper end of the mainstream category in Q2, but I believe that to be a lie by Intel marketing, eager to cover up the fact that they essentially have no upper-mainstream product from now until Q3 2001, a hole in their product line a mile wide. (Am I still an Intel apologist?) Indeed, this is the reason I just bought AMD stock very recently, and have been encouraging my INTC-owning relatives to sell ever since, well as it turned out, just before the peak late this summer. And yes, like you too I am generally appalled by Intel's heavy-handed anti-consumer tactics--suing VIA and refusing to release a PC133 chipset in a lame attempt to force RDRAM down the industry's throat; paper launching the 1 GHz P3 6 months before even limited volume was available, the 700-850 MHz P3's before it around 3 months early, and the 1.13...oh the 1.13...all in a lame attempt to pretend the P6 could keep up with the Athlon; bribing Michael Dell with special pricing and all several dozen GHz P3s available this summer to spread libelous statements to the media in a lame attempt to disparage AMD's products; spreading IA-64 FUD in a lame though successful attempt to scare designers of competing RISC chips to delay (Compaq, HP) or eliminate (MIPS) their next-gen chips; keeping the Celeron FSB clocked at 66 MHz and "single-processor only" in a lame attempt to...be lame.
Don't worry, I dislike Intel plenty a lot. I cheer for AMD, and make no bones about it.
What bothers me, though, is that, having been on the Athlon bandwagon since summer 1999, when I first read analyses of how the K6's poor scaling was due to architecture not process quality, and how the better balanced K7 had the chance to scale even higher than the P3, I've seen how this position has gone from being contrarian, well-informed and far-sighted to the position of a growing mainstream of ill-informed buzzword-spouting reality-ignoring AMD fanboys. No, not you; the people I'm talking about are much much worse (and hence not nearly as able to fool /. with uninformed arguments). What's even worse, though, is that several influential tech sites employ writers not much more knowledgeable than you, and they spout the same pro-AMD propaganda day after day after day. It's not that I dislike seeing anything pro-AMD or anti-Intel; indeed, exactly the contrary. It's just that I like it to be true.
Plus, AMD's execution with the K7, while quite good, has been well short of the claims that I and many others were making for it over the past year. The benchmarks have been disappointing. There's only so much excitement you can get out of awesome benches in 3DSMax and ViewPerf before you notice those Q3 and Content Creation scores just aren't going to change. (Yes, I know CC is Intel-biased. Whatever.) Thunderbird in particular was a huge disappointment, offering gains on the order of 3-5% over Athlon Classic while the Coppermine P3 beat Katmai by 10-15% (it's that 64-bit vs. 256 bit L2 bus). MP has been MIA for months now. The K7 laptops are late as well; high power-consumption is the price you pay for unneeded FPUs.
Having read the Willamette articles I've now referred you to twice (the DeMone ones on RWT), having seen Paul defend his unorthodox position on the Ace's tech boards for months now, basically skewering even very well-informed arguments on the AMD side, I've gradually become convinced that the "web hardware community" is greatly underestimating the P4's performance. So have many people much more knowledgeable than me--including the formerly (and still, though less so, IMO) AMD-biased Johan.
I usually go around looking to argue with P4-bashers who seem intelligent and well-versed in the technology, because they give the most interesting arguments and are the most willing to learn. Unfortunately, I too often have to correct well-meaning but misleading posters like yourself, who ignorantly pass on the same-old wishful thinking and oversimple analysis as fact.
I like AMD. I really do. I want them to "win", inasmuch as I want anyone to. I really do want them to stay very very competitive, like they are now. (And to make me lots of money!) But I just don't think it's helping them, or helping the truth, to pretend that the K7, a largely derivative design, will be able to keep up with the radically innovative P4 for very long. And I don't think it's furthering the principles of beauty and elegence in design--which is what really interests me in this stuff anyways--to call an insightful and fair (I thought it strongly gave AMD the benefit of the doubt, BTW) analysis of the strengths and weaknesses of the P4 and Mustang designs "an ignorant fluffy rant", or whatever you said.
I won't expect the apology from you, but you have my email address if you should want to send it. Meanwhile, if you're really interested in MPU design, please read Paul's articles at RWT; they're fabulous and take everything to a whole new level. And if they must be anti-Intel, you can't do better than his Merced/Itanium articles, here, here, here and here.
Also you should check out the tech forum at Ace's, and the very AMD-biased but usually literate and often a great site for news and links...JC's. Plus the usual suspects: Tom's, Ars, The Register for juicy-and-occasionally-even-true rumors. You could learn a lot, and trust me, it's fascinating stuff. -
Re:You have way too much time on your hands, frien
First off, apologies for slipping in ad hominem attacks in my post. However, this was just in response to your similarly inappropriate attacks on Johan and Ace's. The difference, of course, is that my comments were in support of the correct analysis, not disparaging it.
This is not a site on which every little review and rant is meant to be posted. Hannibal's article belonged here because, well, Hannibal is an expert on the technology behind microprocessors.
1) It was Hemos' decision to post this; anyone can submit anything they deem worthwhile.
2) This was neither review or rant, but rather a lengthy and insightful look at some subtle but very important issues that will influence P4 vs. Mustang performance. Just because you've never seen anything on the web supportive of the P4 doesn't make a balanced piece a rant; it just means that you've been reading a lot of ignorant writing.
3) Humorously enough, the "self-promot ing" Hannibal link I offered was exactly "every little review", this time of some gimmicky portable (but monitor-less) PC. I found it entertaining, and was happy to see it on /., but it was the very definition of a fluff piece--like much of /., now that you mention it.
4) Hannibal is NOT an MPU expert. He himself will acknowlege this, and has in his articles (don't have time to find where). Email him yourself and ask him who is more of an expert, himself or Johan De Gelas, and I am relatively certain he'll say Johan. If not, he will readily admit that Johan is at least his equal and that Ace's is a much more technical site than Ars. And he will most certainly admit that Paul DeMone is 10 times the expert he is. Again, I really really like Hannibal's work, it honestly inspired me, and I submit every new Hannibal-on-architecture article to /. But he is just a student, not an expert.
And, just FYI, I have read every single article on microproccessor design that has passed by the /. pages for two years, plus linkage from several other sites, and a few print articles (though I no longer like to touch paper. How primitive...). I could easily look up links, hell, just by using the search features on /. and Anand.
ROFLMAO!
You read the scant handful of poorly chosen architecture articles linked from slashdot and you consider yourself an expert??? HAHAHAHAHAHA. Oh--and sometimes you check your facts with little old 16-year old Anand.
Look dude, it isn't my place to criticize you for not knowing as much about MPU design as I do. It is my place to criticize you for not realizing that there is much more to be known, for not realizing that many people do know more about it than you. I am certainly no expert--I'm just a college student--but it is blindingly clear that I know more about it than you, just as it is clear to most /.ers that they know more about computers than, say, the guy who says he needs to go out and buy more "RAMs" because the new game he just bought says it requires 250MB of free space to install. (Don't worry--you're not that bad, it was just an analogy. ;)
Second it's quite clear that you essentially skipped all the parts of the article you didn't understand and concluded that if you--with your expert education on MPU design from /. and Anandtech--didn't know what was going on, it must be "IYNSHO, fluff masquerading as technical writing". Unfortunately, your opinion, humble or no, does not apply here: it is indeed a fact that this piece contained several new insights, and synthesized information which was not easily available in other forms. This may not meet your standards of being "more than a book report," but it is certainly meets those of technical writing. Obviously Johan could not hope to benchmark the new P4 or Mustang cores, as they are not released yet; still he managed to include some insightful benchmarks which demonstrate the points he was abstractly discussing with ample clarity. (Of course, if you're used to looking at MPUs as mysterious black boxes, then you might wonder what rehashed K6 benchmarks are doing in a Mustang/P4 article.) If you truly believe that this article included "nothing unique", why don't you post just one article detailing the issues I raised in my previous post? Since you've obviously read such an article yourself, MPU expert that you are, it shouldn't be too difficult to dig up a link, even without resorting to "the search features on Anand." (LOL!)
No, Johan didn't take what might be called the "Hannibal route"--i.e. launch into an exploration of the overall design philosophies behind the two cores--because he is writing for a specific audience, a knowledgeable technical audience who can be expected to have read several pieces explaining the important design features of the P4 (not much concrete is known about the Mustang other than that it will be a K7 with tweaked layout to improve critical path and power consumption, and that it may recieve several other enhancements as speculated in the article), specifically those here, here, here, here, and here. Not only have most regular readers of Ace's read all these articles, but they have followed some very interesting debates on them between industry experts on the Ace's tech forum for months now. It might be fair to criticize Johan for submitting an article which clearly assumed such a technical background to /. (although in fairness he includes a link to his earlier, more general P4 article in the very first sentence); of course, it's /. who decides what to post on their own site, not Johan.
Re: preproduction benchmarks, the Tom's piece on the PII and the Firingsquad piece on the K7 were generally the only benchmarks available of the respective chips before their launch. If you followed MPU design news as closely as I do you would know this. There is a thing called an NDA, after all; as these two pieces demonstrate, both Intel and AMD like to make sure that those who choose to break theirs post erroneous information.
You're of course right that a PPro was indeed superior to a PII at a given clock speed; if you look through the article itself instead of just relying on the concluding quotes I posted, you would find benchmarks which clearly understate the known performance of the PII by as much as 30 or 40%, though. There is no doubt Tom's preproduction benchmarks, like Firingsquad's, were horribly off. And as long as you're disputing my "always" contention--I've ponied up the links (and no, it didn't take me very long at all, because I, having followed MPU developments for a couple years, knew for example that it was FS with the bad preproduction K7 benchmarks, and Tom with the PII controversy); why don't you post a single pre-NDA "review" or even just a series of leaked benchmarks on a new x86 core which proved entirely accurate?
Re: definition of a 7th-gen core: You really should pay more attention in class, boy, because I'm schooling you right now. I explained what I meant right after that comment in the original post: The Athlon is a new core designed to scale well to very high clockspeeds. Just like the Willammette. That's why the Willamette performs slower clock-for-clock than a theoreticl P!!! at the same clockspeed
First off, there is no evidence that the P4 has lower IPC than the P3, except for preproduction benchmarks and some ambiguous comments from Intel VPs. If you read my previous post at all, you would realize these would tend to indicate that the P4 actually has higher IPC, not lower. On the other hand, the main evidence that it has higher IPC is that an analysis of all the new, innovative, braniac features of the core strongly indicate that it must.
And second off, you couldn't be more wrong. By calling the K7 a "7th-gen core" you are obviously comparing it to the 6 previous generations of Intel cores. Each of them was able to improve both clock speed on identical process and IPC significantly over the previous generation. The Athlon beats the P6 in clock speed on identical process...but only narrowly: the Athlon sweet spot right now is around 1 GHz on Dresden's .18 Cu process; the P6 around 750 MHz on a .18 Al process. Intel's process is probably slightly better except for the large Cu vs. Al gap, so we can be charitable and say that, on identical processes, the K7 clocks 25% faster in an untweaked core than the P6 does in a much-tweaked core. Indications are that the Mustang/Palomino/Morgan K7 tweak will reach 1.5 GHz on .18 Cu, so perhaps 35-40% better on equivalent processes. As for IPC, the P6 and K7 are essentially equal. Indeed, this is being generous to the K7, as the P6 knocks it all over the place in the fairest cross-platform bench there is, SPEC. Yes, this is because Intel's in-house compiler group is better than AMD's...but the compiler is arguably just as much a part of a core as the silicon itself.
Meanwhile, P4 roadmaps indicate that it will scale 100% better than the P6 on identical processes, and the analysis of Paul DeMone, a far greater MPU expert than you or I could hope to be, is that it will have 15-20% better IPC for integer work, and considerably greater gains for FP. (It's too soon to tell without knowing more about how well compilers will optimize for SSE2.) That would be a 7th-gen core worthy of the leap from 5th to 6th which the P6 provided.
Again, don't get me wrong: the Athlon is clearly clearly superior to the Coppermine P3. But only by about the same degree as the Coppermine P3 was superior to the Katmai P3. That is, *not* by a full "generation"--whatever the hell that is.
Re: important innovations in the K7: It's a huge innovation in the x86 world, something Intel hasn't done since the PPro days. First of all, the EV6 bus is new to x86 and a huge innovation, it is superior in every way to the old Intel GTL+.
BWAA HAA HAA HAA HAA HAA!! Man, I'm rolling on the floor and crying that's so pitiful.
Oh, but I'm being rude. Ahem. Pardon me. You, uh...you do know why it's called the EV6 bus, don't you? ...Even though that name happens to be shared by the current generation of Compaq Alpha MPUs... Or wait; actually the official name of the current Alphas chips is 21264; it's just that they like to code name their core variants things like "EV6" and "EV67" and the upcoming "EV68". Based on what, again? What's that? Based on the code name of the current Alpha platform???
HAA HAA HAA HAA HAA! I asked you to pick one innovative feature of the K7, and you picked the one feature that AMD DIDN'T INVENT!!!!!
Ok, I'm over it now. Phew.
Right. AMD didn't invent the EV6 bus. They didn't help develop it. They in fact had nothing to do with it. They licensed it, wholesale, lock, stock, barrel, from Compaq where it has been in use for quite some time now. On the one hand, it was a good business decision because Intel had just clamped down and decided not to relicense the P6 bus (not really called GTL+ BTW, but don't worry it's a very common mistake) to AMD, and rather than take the time to reinvent the wheel (and thus delay the launch of the K7), AMD decided to go shopping at Compaq. Fine. Smart decision.
Don't give me any of this revisionist history that they did it because it's 200 MHz, though. The K7's extra FSB bandwidth (courtesy of the EV6 bus (and the engineers at Compaq, not AMD)) has up till now been entirely wasted as it is paired with SDR SDRAM (1.6 GB/s FSB, only 1.06 GB/s from DRAM)--generally paired asynchronously with PC133. If it were any help at all, don't you think the Athlon would be winning and not losing in FSB-intensive benchmarks like Q3? Meanwhile, it's a huge waste of pins and power--as well it should be, since it was originally designed for $10,000-50,000 workstations and servers, which, frankly, can afford the extra mobo costs, power supplies and electric bills.
Now, of course this extra FSB bandwidth will finally be put to good use with the advent of DDR mobos for the K7, *finally* starting early next month (fingers crossed!); latest news is the 1st. DDR mobos for the P3 will show less improvement because the P3 is stuck at 133 MHz FSB. Fine.
But this isn't why AMD chose the EV6. Indeed, when they made that decision, the DDR standard had either barely-just-been or had-not-yet-been determined by JEDEC. Intel was set to steamroll RDRAM into every PC, and there was little to no indictation that DDR would ever be a volume part in the PC industry. (It'd be used in servers and such.) AMD chose the EV6 because they *had* to, not because they wanted to. It's a great bus when doing what it's designed for--connecting specially made (quite expensive) double-wide SDRAM to Alphas, at FSB freqencies up to 466 MHz. But it offers little to no performance benefit in the here and now for the K7. And as for DDR and high-speed buses, Intel will be releasing their Tualatin revision P3's in Q2 with a 200 MHz FSB, in time for Almador, their (maybe--legal issues with Rambus...) DDR P3 chipset. So yes, the K7 will be first with decent DDR support in the x86 space. The P4's dual-RDRAM chipset and 3.2 GB/s FSB will be faster, though more expensive, as far as memory performance goes, though.
But calling a bus that AMD had exactly zero nada zilch nothing to do with evidence of their design innovation gets an extra HAHAHAHAHAHA from me.
More on AMD's innovative EV6 bus: it's a huge boon for multiprocessing, with the 760MP coming to retail very soon according to Anand.
Unfortunately, according to AMD's Q3 earnings report Investor Conference Call 2 weeks ago (I was listening; somehow I doubt you were...), the 760 MP has been delayed to at least Q1, possibly Q2. They played it off as strategic reasons (business demand down; no major deals with the big 4 server OEMs (Dell, IBM, HP, Compaq) for AMD in the enterprise lines), but considering they only had one 2-way system--behind closed doors and not running anything--at MPF it looks as if their engineering is behind too. On the one hand, too bad, because point-to-point beats shared bus any day. On the other, there's a reason why Intel went with shared bus, and it's not because they'd never heard of PTP. It's, well, easier to implement. When doing the right thing takes over a year longer, it sometimes becomes doing the wrong thing. (Not that I believe that's true here, but it's worth taking into consideration.)
Now, the countryside is littered with Athlon clusters crunching numbers for the scientific community in places where they'd never have considered using a P!!!.
First off, scientific computing is such a niche market as to have absolutely negligable impact on the bottom line of either company. The idea that AMD designed the K7's huge-ass FPU--thus taking up vital die-space--for the lucrative physicist market is laughable. It's an unbalanced design, plain and simple. Second, last time I checked, most scientific computing was being done either on Alphas or on Beowulf's of Celerons. Now, I don't doubt that K7's are moving heavily into the mix; if I was doing scientific computing, I would go with a cluster of Durons in a heartbeat.
But do you really, honestly, think that when AMD decided to go with the 3-wide FPU there were dreams of meteorology and electron potential modeling spinning in their heads? Me either.
And your "analysis" of the supposed advantages of the Coppermine's cache over the Thunderbird's are positively laughable. You see, in the REAL WORLD people don't run benchmarks on their boxes all day. They run apps and processes, usually several at a time. That's why the Athlon's cache is superior--you can keep more in it instead of swapping to system RAM, which is a MUCH BIGGER HIT than having a small amount of cache latency. In the REAL WORLD, the Athlon's cache architecture makes sense, not in your fantasy where we all run CPUmark all day.
Uhhuh. That's why the Katmai P3--with its half-speed 512 Kb L2--was so much faster than the Coppermine? That's why the Athlon "Classic"--with down to 1/3-speed 512Kb L2--is so much better than TBird??
You think it's faster to perform a context switch with a 64-bit bus to L2 than a 256-bit one? Golly, imagine how slow the P4 with its 48 GB/s bus to the 5-cycle latency L2 will be!!
Furthermore, in case you'd forgotten, all these chips operate at over one billion cycles per second. Multitasking occurs at much higher granularity than this, and even if your analysis were right (it's not), the effects of multitasking are invisible to a chip to a second or third order of approximation. The effects of a 7 (or 5!) cycle L2 vs. an 11 cycle one most certainly are not.
And then you go on to say that the K8 will just be a derivative of the K7. You really aren't paying attention at all. The K8 Hammer architecture is completely new, not only extending x86 to true 64-bit while retaining backwards-compatibility with 32-bit and 16-bit code, but adding huge and significant architectural innovations. Go read about it, dimwit, before you guess at what it is. Lots of documentation has been released--even just a quick scan of some Slashdot search results will make you a lot more knowledgeable about it than you are now. Geez...
No, this is false. It is by now quite well known that what will define the Hammer family will be just a simple extension of the x86 ISA to 64 bits--an extension which will have essentially no use for the average PC user, but rather only for those who need 64-bit integer precision (CAD, etc.) or >32-bit memory address space (database, etc.). In addition, "the K8"--that is, the Sledgehammer, aimed at the enterprise market--will feature 2-way CMP and AMD's new Southbridge standard, LDT. Ho-hum. Nice features (LDT has no place in the PC either, though), but nothing extraordinary, especially considering it's not due until early 2002. In addition, there has been mention (Sanders mentioned it in an interview) of another K8 variant called Clawhammer; speculation is that this is a PC version of the K8, although it's not known what, other than x86-64, will differentiate it from, say, Palomino.
If you actually believe the K8 represents an entirely new design, then it may be that your news was correct but just a little (about 12 months) late. The K8 was indeed scheduled to be a ground-up clean-sheet kick-butt design, but was radically scaled back by Sanders less than a year ago. Head Designer and impressive guy Atiq Raza quit around a year ago, following this decision, and the hopes of a truly innovative K8 went with him. Of course, evolution is often better than revolution in the MPU industry--eg. RDRAM. If it can manage to position itself against Itanium, the K8 might look very strong. (Of course, McKinley will be on the way by then, and it's considerably less of a joke.) But claiming that the K8 is a revolutionary new design is plain false.
As for the recent /. articles on the K8, they have all been, IIRC, about the recently released x86-64 simulator to help Linux, etc. port to the new ISA. This has nothing whatsoever to do with the design of the K8 itself--an x86 simulator could help "port" Linux to either a P5 or a P6, although they could not be more different architecturally--just the ISA.
Re: P4 moving into heavy volume in Q3 2001: God, you are a shameless, and dim-witted, Intel apologist, just as I suspected. Q3 2001?
How precisely does this make me an Intel apologist? Is it too early?? It is a known fact that Intel's roadmap moves the P4 solidly into the mainstream category in Q3 '01 with the introduction of the Northwood P4 on a .13 Cu process. Northwood will allow Intel to get good yields at >2.0 GHz, and, more importantly, takes up a much more reasonable die space for mass production. Just as important, its release will coincide with the release of the (hopefully DDR) SDRAM Brookdale chipset, which ought to move the P4 out of the quite-high end where it will be stuck with the dual-RDRAM Tehama chipset. (3rd party DDR chipsets may be out for the P4 before then, but probably not in much volume before Q2 at the earliest.)
Is it too late?? The latest Intel roadmap shows the P4 moving to the upper end of the mainstream category in Q2, but I believe that to be a lie by Intel marketing, eager to cover up the fact that they essentially have no upper-mainstream product from now until Q3 2001, a hole in their product line a mile wide. (Am I still an Intel apologist?) Indeed, this is the reason I just bought AMD stock very recently, and have been encouraging my INTC-owning relatives to sell ever since, well as it turned out, just before the peak late this summer. And yes, like you too I am generally appalled by Intel's heavy-handed anti-consumer tactics--suing VIA and refusing to release a PC133 chipset in a lame attempt to force RDRAM down the industry's throat; paper launching the 1 GHz P3 6 months before even limited volume was available, the 700-850 MHz P3's before it around 3 months early, and the 1.13...oh the 1.13...all in a lame attempt to pretend the P6 could keep up with the Athlon; bribing Michael Dell with special pricing and all several dozen GHz P3s available this summer to spread libelous statements to the media in a lame attempt to disparage AMD's products; spreading IA-64 FUD in a lame though successful attempt to scare designers of competing RISC chips to delay (Compaq, HP) or eliminate (MIPS) their next-gen chips; keeping the Celeron FSB clocked at 66 MHz and "single-processor only" in a lame attempt to...be lame.
Don't worry, I dislike Intel plenty a lot. I cheer for AMD, and make no bones about it.
What bothers me, though, is that, having been on the Athlon bandwagon since summer 1999, when I first read analyses of how the K6's poor scaling was due to architecture not process quality, and how the better balanced K7 had the chance to scale even higher than the P3, I've seen how this position has gone from being contrarian, well-informed and far-sighted to the position of a growing mainstream of ill-informed buzzword-spouting reality-ignoring AMD fanboys. No, not you; the people I'm talking about are much much worse (and hence not nearly as able to fool /. with uninformed arguments). What's even worse, though, is that several influential tech sites employ writers not much more knowledgeable than you, and they spout the same pro-AMD propaganda day after day after day. It's not that I dislike seeing anything pro-AMD or anti-Intel; indeed, exactly the contrary. It's just that I like it to be true.
Plus, AMD's execution with the K7, while quite good, has been well short of the claims that I and many others were making for it over the past year. The benchmarks have been disappointing. There's only so much excitement you can get out of awesome benches in 3DSMax and ViewPerf before you notice those Q3 and Content Creation scores just aren't going to change. (Yes, I know CC is Intel-biased. Whatever.) Thunderbird in particular was a huge disappointment, offering gains on the order of 3-5% over Athlon Classic while the Coppermine P3 beat Katmai by 10-15% (it's that 64-bit vs. 256 bit L2 bus). MP has been MIA for months now. The K7 laptops are late as well; high power-consumption is the price you pay for unneeded FPUs.
Having read the Willamette articles I've now referred you to twice (the DeMone ones on RWT), having seen Paul defend his unorthodox position on the Ace's tech boards for months now, basically skewering even very well-informed arguments on the AMD side, I've gradually become convinced that the "web hardware community" is greatly underestimating the P4's performance. So have many people much more knowledgeable than me--including the formerly (and still, though less so, IMO) AMD-biased Johan.
I usually go around looking to argue with P4-bashers who seem intelligent and well-versed in the technology, because they give the most interesting arguments and are the most willing to learn. Unfortunately, I too often have to correct well-meaning but misleading posters like yourself, who ignorantly pass on the same-old wishful thinking and oversimple analysis as fact.
I like AMD. I really do. I want them to "win", inasmuch as I want anyone to. I really do want them to stay very very competitive, like they are now. (And to make me lots of money!) But I just don't think it's helping them, or helping the truth, to pretend that the K7, a largely derivative design, will be able to keep up with the radically innovative P4 for very long. And I don't think it's furthering the principles of beauty and elegence in design--which is what really interests me in this stuff anyways--to call an insightful and fair (I thought it strongly gave AMD the benefit of the doubt, BTW) analysis of the strengths and weaknesses of the P4 and Mustang designs "an ignorant fluffy rant", or whatever you said.
I won't expect the apology from you, but you have my email address if you should want to send it. Meanwhile, if you're really interested in MPU design, please read Paul's articles at RWT; they're fabulous and take everything to a whole new level. And if they must be anti-Intel, you can't do better than his Merced/Itanium articles, here, here, here and here.
Also you should check out the tech forum at Ace's, and the very AMD-biased but usually literate and often a great site for news and links...JC's. Plus the usual suspects: Tom's, Ars, The Register for juicy-and-occasionally-even-true rumors. You could learn a lot, and trust me, it's fascinating stuff. -
Re:You have way too much time on your hands, frien
First off, apologies for slipping in ad hominem attacks in my post. However, this was just in response to your similarly inappropriate attacks on Johan and Ace's. The difference, of course, is that my comments were in support of the correct analysis, not disparaging it.
This is not a site on which every little review and rant is meant to be posted. Hannibal's article belonged here because, well, Hannibal is an expert on the technology behind microprocessors.
1) It was Hemos' decision to post this; anyone can submit anything they deem worthwhile.
2) This was neither review or rant, but rather a lengthy and insightful look at some subtle but very important issues that will influence P4 vs. Mustang performance. Just because you've never seen anything on the web supportive of the P4 doesn't make a balanced piece a rant; it just means that you've been reading a lot of ignorant writing.
3) Humorously enough, the "self-promot ing" Hannibal link I offered was exactly "every little review", this time of some gimmicky portable (but monitor-less) PC. I found it entertaining, and was happy to see it on /., but it was the very definition of a fluff piece--like much of /., now that you mention it.
4) Hannibal is NOT an MPU expert. He himself will acknowlege this, and has in his articles (don't have time to find where). Email him yourself and ask him who is more of an expert, himself or Johan De Gelas, and I am relatively certain he'll say Johan. If not, he will readily admit that Johan is at least his equal and that Ace's is a much more technical site than Ars. And he will most certainly admit that Paul DeMone is 10 times the expert he is. Again, I really really like Hannibal's work, it honestly inspired me, and I submit every new Hannibal-on-architecture article to /. But he is just a student, not an expert.
And, just FYI, I have read every single article on microproccessor design that has passed by the /. pages for two years, plus linkage from several other sites, and a few print articles (though I no longer like to touch paper. How primitive...). I could easily look up links, hell, just by using the search features on /. and Anand.
ROFLMAO!
You read the scant handful of poorly chosen architecture articles linked from slashdot and you consider yourself an expert??? HAHAHAHAHAHA. Oh--and sometimes you check your facts with little old 16-year old Anand.
Look dude, it isn't my place to criticize you for not knowing as much about MPU design as I do. It is my place to criticize you for not realizing that there is much more to be known, for not realizing that many people do know more about it than you. I am certainly no expert--I'm just a college student--but it is blindingly clear that I know more about it than you, just as it is clear to most /.ers that they know more about computers than, say, the guy who says he needs to go out and buy more "RAMs" because the new game he just bought says it requires 250MB of free space to install. (Don't worry--you're not that bad, it was just an analogy. ;)
Second it's quite clear that you essentially skipped all the parts of the article you didn't understand and concluded that if you--with your expert education on MPU design from /. and Anandtech--didn't know what was going on, it must be "IYNSHO, fluff masquerading as technical writing". Unfortunately, your opinion, humble or no, does not apply here: it is indeed a fact that this piece contained several new insights, and synthesized information which was not easily available in other forms. This may not meet your standards of being "more than a book report," but it is certainly meets those of technical writing. Obviously Johan could not hope to benchmark the new P4 or Mustang cores, as they are not released yet; still he managed to include some insightful benchmarks which demonstrate the points he was abstractly discussing with ample clarity. (Of course, if you're used to looking at MPUs as mysterious black boxes, then you might wonder what rehashed K6 benchmarks are doing in a Mustang/P4 article.) If you truly believe that this article included "nothing unique", why don't you post just one article detailing the issues I raised in my previous post? Since you've obviously read such an article yourself, MPU expert that you are, it shouldn't be too difficult to dig up a link, even without resorting to "the search features on Anand." (LOL!)
No, Johan didn't take what might be called the "Hannibal route"--i.e. launch into an exploration of the overall design philosophies behind the two cores--because he is writing for a specific audience, a knowledgeable technical audience who can be expected to have read several pieces explaining the important design features of the P4 (not much concrete is known about the Mustang other than that it will be a K7 with tweaked layout to improve critical path and power consumption, and that it may recieve several other enhancements as speculated in the article), specifically those here, here, here, here, and here. Not only have most regular readers of Ace's read all these articles, but they have followed some very interesting debates on them between industry experts on the Ace's tech forum for months now. It might be fair to criticize Johan for submitting an article which clearly assumed such a technical background to /. (although in fairness he includes a link to his earlier, more general P4 article in the very first sentence); of course, it's /. who decides what to post on their own site, not Johan.
Re: preproduction benchmarks, the Tom's piece on the PII and the Firingsquad piece on the K7 were generally the only benchmarks available of the respective chips before their launch. If you followed MPU design news as closely as I do you would know this. There is a thing called an NDA, after all; as these two pieces demonstrate, both Intel and AMD like to make sure that those who choose to break theirs post erroneous information.
You're of course right that a PPro was indeed superior to a PII at a given clock speed; if you look through the article itself instead of just relying on the concluding quotes I posted, you would find benchmarks which clearly understate the known performance of the PII by as much as 30 or 40%, though. There is no doubt Tom's preproduction benchmarks, like Firingsquad's, were horribly off. And as long as you're disputing my "always" contention--I've ponied up the links (and no, it didn't take me very long at all, because I, having followed MPU developments for a couple years, knew for example that it was FS with the bad preproduction K7 benchmarks, and Tom with the PII controversy); why don't you post a single pre-NDA "review" or even just a series of leaked benchmarks on a new x86 core which proved entirely accurate?
Re: definition of a 7th-gen core: You really should pay more attention in class, boy, because I'm schooling you right now. I explained what I meant right after that comment in the original post: The Athlon is a new core designed to scale well to very high clockspeeds. Just like the Willammette. That's why the Willamette performs slower clock-for-clock than a theoreticl P!!! at the same clockspeed
First off, there is no evidence that the P4 has lower IPC than the P3, except for preproduction benchmarks and some ambiguous comments from Intel VPs. If you read my previous post at all, you would realize these would tend to indicate that the P4 actually has higher IPC, not lower. On the other hand, the main evidence that it has higher IPC is that an analysis of all the new, innovative, braniac features of the core strongly indicate that it must.
And second off, you couldn't be more wrong. By calling the K7 a "7th-gen core" you are obviously comparing it to the 6 previous generations of Intel cores. Each of them was able to improve both clock speed on identical process and IPC significantly over the previous generation. The Athlon beats the P6 in clock speed on identical process...but only narrowly: the Athlon sweet spot right now is around 1 GHz on Dresden's .18 Cu process; the P6 around 750 MHz on a .18 Al process. Intel's process is probably slightly better except for the large Cu vs. Al gap, so we can be charitable and say that, on identical processes, the K7 clocks 25% faster in an untweaked core than the P6 does in a much-tweaked core. Indications are that the Mustang/Palomino/Morgan K7 tweak will reach 1.5 GHz on .18 Cu, so perhaps 35-40% better on equivalent processes. As for IPC, the P6 and K7 are essentially equal. Indeed, this is being generous to the K7, as the P6 knocks it all over the place in the fairest cross-platform bench there is, SPEC. Yes, this is because Intel's in-house compiler group is better than AMD's...but the compiler is arguably just as much a part of a core as the silicon itself.
Meanwhile, P4 roadmaps indicate that it will scale 100% better than the P6 on identical processes, and the analysis of Paul DeMone, a far greater MPU expert than you or I could hope to be, is that it will have 15-20% better IPC for integer work, and considerably greater gains for FP. (It's too soon to tell without knowing more about how well compilers will optimize for SSE2.) That would be a 7th-gen core worthy of the leap from 5th to 6th which the P6 provided.
Again, don't get me wrong: the Athlon is clearly clearly superior to the Coppermine P3. But only by about the same degree as the Coppermine P3 was superior to the Katmai P3. That is, *not* by a full "generation"--whatever the hell that is.
Re: important innovations in the K7: It's a huge innovation in the x86 world, something Intel hasn't done since the PPro days. First of all, the EV6 bus is new to x86 and a huge innovation, it is superior in every way to the old Intel GTL+.
BWAA HAA HAA HAA HAA HAA!! Man, I'm rolling on the floor and crying that's so pitiful.
Oh, but I'm being rude. Ahem. Pardon me. You, uh...you do know why it's called the EV6 bus, don't you? ...Even though that name happens to be shared by the current generation of Compaq Alpha MPUs... Or wait; actually the official name of the current Alphas chips is 21264; it's just that they like to code name their core variants things like "EV6" and "EV67" and the upcoming "EV68". Based on what, again? What's that? Based on the code name of the current Alpha platform???
HAA HAA HAA HAA HAA! I asked you to pick one innovative feature of the K7, and you picked the one feature that AMD DIDN'T INVENT!!!!!
Ok, I'm over it now. Phew.
Right. AMD didn't invent the EV6 bus. They didn't help develop it. They in fact had nothing to do with it. They licensed it, wholesale, lock, stock, barrel, from Compaq where it has been in use for quite some time now. On the one hand, it was a good business decision because Intel had just clamped down and decided not to relicense the P6 bus (not really called GTL+ BTW, but don't worry it's a very common mistake) to AMD, and rather than take the time to reinvent the wheel (and thus delay the launch of the K7), AMD decided to go shopping at Compaq. Fine. Smart decision.
Don't give me any of this revisionist history that they did it because it's 200 MHz, though. The K7's extra FSB bandwidth (courtesy of the EV6 bus (and the engineers at Compaq, not AMD)) has up till now been entirely wasted as it is paired with SDR SDRAM (1.6 GB/s FSB, only 1.06 GB/s from DRAM)--generally paired asynchronously with PC133. If it were any help at all, don't you think the Athlon would be winning and not losing in FSB-intensive benchmarks like Q3? Meanwhile, it's a huge waste of pins and power--as well it should be, since it was originally designed for $10,000-50,000 workstations and servers, which, frankly, can afford the extra mobo costs, power supplies and electric bills.
Now, of course this extra FSB bandwidth will finally be put to good use with the advent of DDR mobos for the K7, *finally* starting early next month (fingers crossed!); latest news is the 1st. DDR mobos for the P3 will show less improvement because the P3 is stuck at 133 MHz FSB. Fine.
But this isn't why AMD chose the EV6. Indeed, when they made that decision, the DDR standard had either barely-just-been or had-not-yet-been determined by JEDEC. Intel was set to steamroll RDRAM into every PC, and there was little to no indictation that DDR would ever be a volume part in the PC industry. (It'd be used in servers and such.) AMD chose the EV6 because they *had* to, not because they wanted to. It's a great bus when doing what it's designed for--connecting specially made (quite expensive) double-wide SDRAM to Alphas, at FSB freqencies up to 466 MHz. But it offers little to no performance benefit in the here and now for the K7. And as for DDR and high-speed buses, Intel will be releasing their Tualatin revision P3's in Q2 with a 200 MHz FSB, in time for Almador, their (maybe--legal issues with Rambus...) DDR P3 chipset. So yes, the K7 will be first with decent DDR support in the x86 space. The P4's dual-RDRAM chipset and 3.2 GB/s FSB will be faster, though more expensive, as far as memory performance goes, though.
But calling a bus that AMD had exactly zero nada zilch nothing to do with evidence of their design innovation gets an extra HAHAHAHAHAHA from me.
More on AMD's innovative EV6 bus: it's a huge boon for multiprocessing, with the 760MP coming to retail very soon according to Anand.
Unfortunately, according to AMD's Q3 earnings report Investor Conference Call 2 weeks ago (I was listening; somehow I doubt you were...), the 760 MP has been delayed to at least Q1, possibly Q2. They played it off as strategic reasons (business demand down; no major deals with the big 4 server OEMs (Dell, IBM, HP, Compaq) for AMD in the enterprise lines), but considering they only had one 2-way system--behind closed doors and not running anything--at MPF it looks as if their engineering is behind too. On the one hand, too bad, because point-to-point beats shared bus any day. On the other, there's a reason why Intel went with shared bus, and it's not because they'd never heard of PTP. It's, well, easier to implement. When doing the right thing takes over a year longer, it sometimes becomes doing the wrong thing. (Not that I believe that's true here, but it's worth taking into consideration.)
Now, the countryside is littered with Athlon clusters crunching numbers for the scientific community in places where they'd never have considered using a P!!!.
First off, scientific computing is such a niche market as to have absolutely negligable impact on the bottom line of either company. The idea that AMD designed the K7's huge-ass FPU--thus taking up vital die-space--for the lucrative physicist market is laughable. It's an unbalanced design, plain and simple. Second, last time I checked, most scientific computing was being done either on Alphas or on Beowulf's of Celerons. Now, I don't doubt that K7's are moving heavily into the mix; if I was doing scientific computing, I would go with a cluster of Durons in a heartbeat.
But do you really, honestly, think that when AMD decided to go with the 3-wide FPU there were dreams of meteorology and electron potential modeling spinning in their heads? Me either.
And your "analysis" of the supposed advantages of the Coppermine's cache over the Thunderbird's are positively laughable. You see, in the REAL WORLD people don't run benchmarks on their boxes all day. They run apps and processes, usually several at a time. That's why the Athlon's cache is superior--you can keep more in it instead of swapping to system RAM, which is a MUCH BIGGER HIT than having a small amount of cache latency. In the REAL WORLD, the Athlon's cache architecture makes sense, not in your fantasy where we all run CPUmark all day.
Uhhuh. That's why the Katmai P3--with its half-speed 512 Kb L2--was so much faster than the Coppermine? That's why the Athlon "Classic"--with down to 1/3-speed 512Kb L2--is so much better than TBird??
You think it's faster to perform a context switch with a 64-bit bus to L2 than a 256-bit one? Golly, imagine how slow the P4 with its 48 GB/s bus to the 5-cycle latency L2 will be!!
Furthermore, in case you'd forgotten, all these chips operate at over one billion cycles per second. Multitasking occurs at much higher granularity than this, and even if your analysis were right (it's not), the effects of multitasking are invisible to a chip to a second or third order of approximation. The effects of a 7 (or 5!) cycle L2 vs. an 11 cycle one most certainly are not.
And then you go on to say that the K8 will just be a derivative of the K7. You really aren't paying attention at all. The K8 Hammer architecture is completely new, not only extending x86 to true 64-bit while retaining backwards-compatibility with 32-bit and 16-bit code, but adding huge and significant architectural innovations. Go read about it, dimwit, before you guess at what it is. Lots of documentation has been released--even just a quick scan of some Slashdot search results will make you a lot more knowledgeable about it than you are now. Geez...
No, this is false. It is by now quite well known that what will define the Hammer family will be just a simple extension of the x86 ISA to 64 bits--an extension which will have essentially no use for the average PC user, but rather only for those who need 64-bit integer precision (CAD, etc.) or >32-bit memory address space (database, etc.). In addition, "the K8"--that is, the Sledgehammer, aimed at the enterprise market--will feature 2-way CMP and AMD's new Southbridge standard, LDT. Ho-hum. Nice features (LDT has no place in the PC either, though), but nothing extraordinary, especially considering it's not due until early 2002. In addition, there has been mention (Sanders mentioned it in an interview) of another K8 variant called Clawhammer; speculation is that this is a PC version of the K8, although it's not known what, other than x86-64, will differentiate it from, say, Palomino.
If you actually believe the K8 represents an entirely new design, then it may be that your news was correct but just a little (about 12 months) late. The K8 was indeed scheduled to be a ground-up clean-sheet kick-butt design, but was radically scaled back by Sanders less than a year ago. Head Designer and impressive guy Atiq Raza quit around a year ago, following this decision, and the hopes of a truly innovative K8 went with him. Of course, evolution is often better than revolution in the MPU industry--eg. RDRAM. If it can manage to position itself against Itanium, the K8 might look very strong. (Of course, McKinley will be on the way by then, and it's considerably less of a joke.) But claiming that the K8 is a revolutionary new design is plain false.
As for the recent /. articles on the K8, they have all been, IIRC, about the recently released x86-64 simulator to help Linux, etc. port to the new ISA. This has nothing whatsoever to do with the design of the K8 itself--an x86 simulator could help "port" Linux to either a P5 or a P6, although they could not be more different architecturally--just the ISA.
Re: P4 moving into heavy volume in Q3 2001: God, you are a shameless, and dim-witted, Intel apologist, just as I suspected. Q3 2001?
How precisely does this make me an Intel apologist? Is it too early?? It is a known fact that Intel's roadmap moves the P4 solidly into the mainstream category in Q3 '01 with the introduction of the Northwood P4 on a .13 Cu process. Northwood will allow Intel to get good yields at >2.0 GHz, and, more importantly, takes up a much more reasonable die space for mass production. Just as important, its release will coincide with the release of the (hopefully DDR) SDRAM Brookdale chipset, which ought to move the P4 out of the quite-high end where it will be stuck with the dual-RDRAM Tehama chipset. (3rd party DDR chipsets may be out for the P4 before then, but probably not in much volume before Q2 at the earliest.)
Is it too late?? The latest Intel roadmap shows the P4 moving to the upper end of the mainstream category in Q2, but I believe that to be a lie by Intel marketing, eager to cover up the fact that they essentially have no upper-mainstream product from now until Q3 2001, a hole in their product line a mile wide. (Am I still an Intel apologist?) Indeed, this is the reason I just bought AMD stock very recently, and have been encouraging my INTC-owning relatives to sell ever since, well as it turned out, just before the peak late this summer. And yes, like you too I am generally appalled by Intel's heavy-handed anti-consumer tactics--suing VIA and refusing to release a PC133 chipset in a lame attempt to force RDRAM down the industry's throat; paper launching the 1 GHz P3 6 months before even limited volume was available, the 700-850 MHz P3's before it around 3 months early, and the 1.13...oh the 1.13...all in a lame attempt to pretend the P6 could keep up with the Athlon; bribing Michael Dell with special pricing and all several dozen GHz P3s available this summer to spread libelous statements to the media in a lame attempt to disparage AMD's products; spreading IA-64 FUD in a lame though successful attempt to scare designers of competing RISC chips to delay (Compaq, HP) or eliminate (MIPS) their next-gen chips; keeping the Celeron FSB clocked at 66 MHz and "single-processor only" in a lame attempt to...be lame.
Don't worry, I dislike Intel plenty a lot. I cheer for AMD, and make no bones about it.
What bothers me, though, is that, having been on the Athlon bandwagon since summer 1999, when I first read analyses of how the K6's poor scaling was due to architecture not process quality, and how the better balanced K7 had the chance to scale even higher than the P3, I've seen how this position has gone from being contrarian, well-informed and far-sighted to the position of a growing mainstream of ill-informed buzzword-spouting reality-ignoring AMD fanboys. No, not you; the people I'm talking about are much much worse (and hence not nearly as able to fool /. with uninformed arguments). What's even worse, though, is that several influential tech sites employ writers not much more knowledgeable than you, and they spout the same pro-AMD propaganda day after day after day. It's not that I dislike seeing anything pro-AMD or anti-Intel; indeed, exactly the contrary. It's just that I like it to be true.
Plus, AMD's execution with the K7, while quite good, has been well short of the claims that I and many others were making for it over the past year. The benchmarks have been disappointing. There's only so much excitement you can get out of awesome benches in 3DSMax and ViewPerf before you notice those Q3 and Content Creation scores just aren't going to change. (Yes, I know CC is Intel-biased. Whatever.) Thunderbird in particular was a huge disappointment, offering gains on the order of 3-5% over Athlon Classic while the Coppermine P3 beat Katmai by 10-15% (it's that 64-bit vs. 256 bit L2 bus). MP has been MIA for months now. The K7 laptops are late as well; high power-consumption is the price you pay for unneeded FPUs.
Having read the Willamette articles I've now referred you to twice (the DeMone ones on RWT), having seen Paul defend his unorthodox position on the Ace's tech boards for months now, basically skewering even very well-informed arguments on the AMD side, I've gradually become convinced that the "web hardware community" is greatly underestimating the P4's performance. So have many people much more knowledgeable than me--including the formerly (and still, though less so, IMO) AMD-biased Johan.
I usually go around looking to argue with P4-bashers who seem intelligent and well-versed in the technology, because they give the most interesting arguments and are the most willing to learn. Unfortunately, I too often have to correct well-meaning but misleading posters like yourself, who ignorantly pass on the same-old wishful thinking and oversimple analysis as fact.
I like AMD. I really do. I want them to "win", inasmuch as I want anyone to. I really do want them to stay very very competitive, like they are now. (And to make me lots of money!) But I just don't think it's helping them, or helping the truth, to pretend that the K7, a largely derivative design, will be able to keep up with the radically innovative P4 for very long. And I don't think it's furthering the principles of beauty and elegence in design--which is what really interests me in this stuff anyways--to call an insightful and fair (I thought it strongly gave AMD the benefit of the doubt, BTW) analysis of the strengths and weaknesses of the P4 and Mustang designs "an ignorant fluffy rant", or whatever you said.
I won't expect the apology from you, but you have my email address if you should want to send it. Meanwhile, if you're really interested in MPU design, please read Paul's articles at RWT; they're fabulous and take everything to a whole new level. And if they must be anti-Intel, you can't do better than his Merced/Itanium articles, here, here, here and here.
Also you should check out the tech forum at Ace's, and the very AMD-biased but usually literate and often a great site for news and links...JC's. Plus the usual suspects: Tom's, Ars, The Register for juicy-and-occasionally-even-true rumors. You could learn a lot, and trust me, it's fascinating stuff. -
Re:You have way too much time on your hands, frien
First off, apologies for slipping in ad hominem attacks in my post. However, this was just in response to your similarly inappropriate attacks on Johan and Ace's. The difference, of course, is that my comments were in support of the correct analysis, not disparaging it.
This is not a site on which every little review and rant is meant to be posted. Hannibal's article belonged here because, well, Hannibal is an expert on the technology behind microprocessors.
1) It was Hemos' decision to post this; anyone can submit anything they deem worthwhile.
2) This was neither review or rant, but rather a lengthy and insightful look at some subtle but very important issues that will influence P4 vs. Mustang performance. Just because you've never seen anything on the web supportive of the P4 doesn't make a balanced piece a rant; it just means that you've been reading a lot of ignorant writing.
3) Humorously enough, the "self-promot ing" Hannibal link I offered was exactly "every little review", this time of some gimmicky portable (but monitor-less) PC. I found it entertaining, and was happy to see it on /., but it was the very definition of a fluff piece--like much of /., now that you mention it.
4) Hannibal is NOT an MPU expert. He himself will acknowlege this, and has in his articles (don't have time to find where). Email him yourself and ask him who is more of an expert, himself or Johan De Gelas, and I am relatively certain he'll say Johan. If not, he will readily admit that Johan is at least his equal and that Ace's is a much more technical site than Ars. And he will most certainly admit that Paul DeMone is 10 times the expert he is. Again, I really really like Hannibal's work, it honestly inspired me, and I submit every new Hannibal-on-architecture article to /. But he is just a student, not an expert.
And, just FYI, I have read every single article on microproccessor design that has passed by the /. pages for two years, plus linkage from several other sites, and a few print articles (though I no longer like to touch paper. How primitive...). I could easily look up links, hell, just by using the search features on /. and Anand.
ROFLMAO!
You read the scant handful of poorly chosen architecture articles linked from slashdot and you consider yourself an expert??? HAHAHAHAHAHA. Oh--and sometimes you check your facts with little old 16-year old Anand.
Look dude, it isn't my place to criticize you for not knowing as much about MPU design as I do. It is my place to criticize you for not realizing that there is much more to be known, for not realizing that many people do know more about it than you. I am certainly no expert--I'm just a college student--but it is blindingly clear that I know more about it than you, just as it is clear to most /.ers that they know more about computers than, say, the guy who says he needs to go out and buy more "RAMs" because the new game he just bought says it requires 250MB of free space to install. (Don't worry--you're not that bad, it was just an analogy. ;)
Second it's quite clear that you essentially skipped all the parts of the article you didn't understand and concluded that if you--with your expert education on MPU design from /. and Anandtech--didn't know what was going on, it must be "IYNSHO, fluff masquerading as technical writing". Unfortunately, your opinion, humble or no, does not apply here: it is indeed a fact that this piece contained several new insights, and synthesized information which was not easily available in other forms. This may not meet your standards of being "more than a book report," but it is certainly meets those of technical writing. Obviously Johan could not hope to benchmark the new P4 or Mustang cores, as they are not released yet; still he managed to include some insightful benchmarks which demonstrate the points he was abstractly discussing with ample clarity. (Of course, if you're used to looking at MPUs as mysterious black boxes, then you might wonder what rehashed K6 benchmarks are doing in a Mustang/P4 article.) If you truly believe that this article included "nothing unique", why don't you post just one article detailing the issues I raised in my previous post? Since you've obviously read such an article yourself, MPU expert that you are, it shouldn't be too difficult to dig up a link, even without resorting to "the search features on Anand." (LOL!)
No, Johan didn't take what might be called the "Hannibal route"--i.e. launch into an exploration of the overall design philosophies behind the two cores--because he is writing for a specific audience, a knowledgeable technical audience who can be expected to have read several pieces explaining the important design features of the P4 (not much concrete is known about the Mustang other than that it will be a K7 with tweaked layout to improve critical path and power consumption, and that it may recieve several other enhancements as speculated in the article), specifically those here, here, here, here, and here. Not only have most regular readers of Ace's read all these articles, but they have followed some very interesting debates on them between industry experts on the Ace's tech forum for months now. It might be fair to criticize Johan for submitting an article which clearly assumed such a technical background to /. (although in fairness he includes a link to his earlier, more general P4 article in the very first sentence); of course, it's /. who decides what to post on their own site, not Johan.
Re: preproduction benchmarks, the Tom's piece on the PII and the Firingsquad piece on the K7 were generally the only benchmarks available of the respective chips before their launch. If you followed MPU design news as closely as I do you would know this. There is a thing called an NDA, after all; as these two pieces demonstrate, both Intel and AMD like to make sure that those who choose to break theirs post erroneous information.
You're of course right that a PPro was indeed superior to a PII at a given clock speed; if you look through the article itself instead of just relying on the concluding quotes I posted, you would find benchmarks which clearly understate the known performance of the PII by as much as 30 or 40%, though. There is no doubt Tom's preproduction benchmarks, like Firingsquad's, were horribly off. And as long as you're disputing my "always" contention--I've ponied up the links (and no, it didn't take me very long at all, because I, having followed MPU developments for a couple years, knew for example that it was FS with the bad preproduction K7 benchmarks, and Tom with the PII controversy); why don't you post a single pre-NDA "review" or even just a series of leaked benchmarks on a new x86 core which proved entirely accurate?
Re: definition of a 7th-gen core: You really should pay more attention in class, boy, because I'm schooling you right now. I explained what I meant right after that comment in the original post: The Athlon is a new core designed to scale well to very high clockspeeds. Just like the Willammette. That's why the Willamette performs slower clock-for-clock than a theoreticl P!!! at the same clockspeed
First off, there is no evidence that the P4 has lower IPC than the P3, except for preproduction benchmarks and some ambiguous comments from Intel VPs. If you read my previous post at all, you would realize these would tend to indicate that the P4 actually has higher IPC, not lower. On the other hand, the main evidence that it has higher IPC is that an analysis of all the new, innovative, braniac features of the core strongly indicate that it must.
And second off, you couldn't be more wrong. By calling the K7 a "7th-gen core" you are obviously comparing it to the 6 previous generations of Intel cores. Each of them was able to improve both clock speed on identical process and IPC significantly over the previous generation. The Athlon beats the P6 in clock speed on identical process...but only narrowly: the Athlon sweet spot right now is around 1 GHz on Dresden's .18 Cu process; the P6 around 750 MHz on a .18 Al process. Intel's process is probably slightly better except for the large Cu vs. Al gap, so we can be charitable and say that, on identical processes, the K7 clocks 25% faster in an untweaked core than the P6 does in a much-tweaked core. Indications are that the Mustang/Palomino/Morgan K7 tweak will reach 1.5 GHz on .18 Cu, so perhaps 35-40% better on equivalent processes. As for IPC, the P6 and K7 are essentially equal. Indeed, this is being generous to the K7, as the P6 knocks it all over the place in the fairest cross-platform bench there is, SPEC. Yes, this is because Intel's in-house compiler group is better than AMD's...but the compiler is arguably just as much a part of a core as the silicon itself.
Meanwhile, P4 roadmaps indicate that it will scale 100% better than the P6 on identical processes, and the analysis of Paul DeMone, a far greater MPU expert than you or I could hope to be, is that it will have 15-20% better IPC for integer work, and considerably greater gains for FP. (It's too soon to tell without knowing more about how well compilers will optimize for SSE2.) That would be a 7th-gen core worthy of the leap from 5th to 6th which the P6 provided.
Again, don't get me wrong: the Athlon is clearly clearly superior to the Coppermine P3. But only by about the same degree as the Coppermine P3 was superior to the Katmai P3. That is, *not* by a full "generation"--whatever the hell that is.
Re: important innovations in the K7: It's a huge innovation in the x86 world, something Intel hasn't done since the PPro days. First of all, the EV6 bus is new to x86 and a huge innovation, it is superior in every way to the old Intel GTL+.
BWAA HAA HAA HAA HAA HAA!! Man, I'm rolling on the floor and crying that's so pitiful.
Oh, but I'm being rude. Ahem. Pardon me. You, uh...you do know why it's called the EV6 bus, don't you? ...Even though that name happens to be shared by the current generation of Compaq Alpha MPUs... Or wait; actually the official name of the current Alphas chips is 21264; it's just that they like to code name their core variants things like "EV6" and "EV67" and the upcoming "EV68". Based on what, again? What's that? Based on the code name of the current Alpha platform???
HAA HAA HAA HAA HAA! I asked you to pick one innovative feature of the K7, and you picked the one feature that AMD DIDN'T INVENT!!!!!
Ok, I'm over it now. Phew.
Right. AMD didn't invent the EV6 bus. They didn't help develop it. They in fact had nothing to do with it. They licensed it, wholesale, lock, stock, barrel, from Compaq where it has been in use for quite some time now. On the one hand, it was a good business decision because Intel had just clamped down and decided not to relicense the P6 bus (not really called GTL+ BTW, but don't worry it's a very common mistake) to AMD, and rather than take the time to reinvent the wheel (and thus delay the launch of the K7), AMD decided to go shopping at Compaq. Fine. Smart decision.
Don't give me any of this revisionist history that they did it because it's 200 MHz, though. The K7's extra FSB bandwidth (courtesy of the EV6 bus (and the engineers at Compaq, not AMD)) has up till now been entirely wasted as it is paired with SDR SDRAM (1.6 GB/s FSB, only 1.06 GB/s from DRAM)--generally paired asynchronously with PC133. If it were any help at all, don't you think the Athlon would be winning and not losing in FSB-intensive benchmarks like Q3? Meanwhile, it's a huge waste of pins and power--as well it should be, since it was originally designed for $10,000-50,000 workstations and servers, which, frankly, can afford the extra mobo costs, power supplies and electric bills.
Now, of course this extra FSB bandwidth will finally be put to good use with the advent of DDR mobos for the K7, *finally* starting early next month (fingers crossed!); latest news is the 1st. DDR mobos for the P3 will show less improvement because the P3 is stuck at 133 MHz FSB. Fine.
But this isn't why AMD chose the EV6. Indeed, when they made that decision, the DDR standard had either barely-just-been or had-not-yet-been determined by JEDEC. Intel was set to steamroll RDRAM into every PC, and there was little to no indictation that DDR would ever be a volume part in the PC industry. (It'd be used in servers and such.) AMD chose the EV6 because they *had* to, not because they wanted to. It's a great bus when doing what it's designed for--connecting specially made (quite expensive) double-wide SDRAM to Alphas, at FSB freqencies up to 466 MHz. But it offers little to no performance benefit in the here and now for the K7. And as for DDR and high-speed buses, Intel will be releasing their Tualatin revision P3's in Q2 with a 200 MHz FSB, in time for Almador, their (maybe--legal issues with Rambus...) DDR P3 chipset. So yes, the K7 will be first with decent DDR support in the x86 space. The P4's dual-RDRAM chipset and 3.2 GB/s FSB will be faster, though more expensive, as far as memory performance goes, though.
But calling a bus that AMD had exactly zero nada zilch nothing to do with evidence of their design innovation gets an extra HAHAHAHAHAHA from me.
More on AMD's innovative EV6 bus: it's a huge boon for multiprocessing, with the 760MP coming to retail very soon according to Anand.
Unfortunately, according to AMD's Q3 earnings report Investor Conference Call 2 weeks ago (I was listening; somehow I doubt you were...), the 760 MP has been delayed to at least Q1, possibly Q2. They played it off as strategic reasons (business demand down; no major deals with the big 4 server OEMs (Dell, IBM, HP, Compaq) for AMD in the enterprise lines), but considering they only had one 2-way system--behind closed doors and not running anything--at MPF it looks as if their engineering is behind too. On the one hand, too bad, because point-to-point beats shared bus any day. On the other, there's a reason why Intel went with shared bus, and it's not because they'd never heard of PTP. It's, well, easier to implement. When doing the right thing takes over a year longer, it sometimes becomes doing the wrong thing. (Not that I believe that's true here, but it's worth taking into consideration.)
Now, the countryside is littered with Athlon clusters crunching numbers for the scientific community in places where they'd never have considered using a P!!!.
First off, scientific computing is such a niche market as to have absolutely negligable impact on the bottom line of either company. The idea that AMD designed the K7's huge-ass FPU--thus taking up vital die-space--for the lucrative physicist market is laughable. It's an unbalanced design, plain and simple. Second, last time I checked, most scientific computing was being done either on Alphas or on Beowulf's of Celerons. Now, I don't doubt that K7's are moving heavily into the mix; if I was doing scientific computing, I would go with a cluster of Durons in a heartbeat.
But do you really, honestly, think that when AMD decided to go with the 3-wide FPU there were dreams of meteorology and electron potential modeling spinning in their heads? Me either.
And your "analysis" of the supposed advantages of the Coppermine's cache over the Thunderbird's are positively laughable. You see, in the REAL WORLD people don't run benchmarks on their boxes all day. They run apps and processes, usually several at a time. That's why the Athlon's cache is superior--you can keep more in it instead of swapping to system RAM, which is a MUCH BIGGER HIT than having a small amount of cache latency. In the REAL WORLD, the Athlon's cache architecture makes sense, not in your fantasy where we all run CPUmark all day.
Uhhuh. That's why the Katmai P3--with its half-speed 512 Kb L2--was so much faster than the Coppermine? That's why the Athlon "Classic"--with down to 1/3-speed 512Kb L2--is so much better than TBird??
You think it's faster to perform a context switch with a 64-bit bus to L2 than a 256-bit one? Golly, imagine how slow the P4 with its 48 GB/s bus to the 5-cycle latency L2 will be!!
Furthermore, in case you'd forgotten, all these chips operate at over one billion cycles per second. Multitasking occurs at much higher granularity than this, and even if your analysis were right (it's not), the effects of multitasking are invisible to a chip to a second or third order of approximation. The effects of a 7 (or 5!) cycle L2 vs. an 11 cycle one most certainly are not.
And then you go on to say that the K8 will just be a derivative of the K7. You really aren't paying attention at all. The K8 Hammer architecture is completely new, not only extending x86 to true 64-bit while retaining backwards-compatibility with 32-bit and 16-bit code, but adding huge and significant architectural innovations. Go read about it, dimwit, before you guess at what it is. Lots of documentation has been released--even just a quick scan of some Slashdot search results will make you a lot more knowledgeable about it than you are now. Geez...
No, this is false. It is by now quite well known that what will define the Hammer family will be just a simple extension of the x86 ISA to 64 bits--an extension which will have essentially no use for the average PC user, but rather only for those who need 64-bit integer precision (CAD, etc.) or >32-bit memory address space (database, etc.). In addition, "the K8"--that is, the Sledgehammer, aimed at the enterprise market--will feature 2-way CMP and AMD's new Southbridge standard, LDT. Ho-hum. Nice features (LDT has no place in the PC either, though), but nothing extraordinary, especially considering it's not due until early 2002. In addition, there has been mention (Sanders mentioned it in an interview) of another K8 variant called Clawhammer; speculation is that this is a PC version of the K8, although it's not known what, other than x86-64, will differentiate it from, say, Palomino.
If you actually believe the K8 represents an entirely new design, then it may be that your news was correct but just a little (about 12 months) late. The K8 was indeed scheduled to be a ground-up clean-sheet kick-butt design, but was radically scaled back by Sanders less than a year ago. Head Designer and impressive guy Atiq Raza quit around a year ago, following this decision, and the hopes of a truly innovative K8 went with him. Of course, evolution is often better than revolution in the MPU industry--eg. RDRAM. If it can manage to position itself against Itanium, the K8 might look very strong. (Of course, McKinley will be on the way by then, and it's considerably less of a joke.) But claiming that the K8 is a revolutionary new design is plain false.
As for the recent /. articles on the K8, they have all been, IIRC, about the recently released x86-64 simulator to help Linux, etc. port to the new ISA. This has nothing whatsoever to do with the design of the K8 itself--an x86 simulator could help "port" Linux to either a P5 or a P6, although they could not be more different architecturally--just the ISA.
Re: P4 moving into heavy volume in Q3 2001: God, you are a shameless, and dim-witted, Intel apologist, just as I suspected. Q3 2001?
How precisely does this make me an Intel apologist? Is it too early?? It is a known fact that Intel's roadmap moves the P4 solidly into the mainstream category in Q3 '01 with the introduction of the Northwood P4 on a .13 Cu process. Northwood will allow Intel to get good yields at >2.0 GHz, and, more importantly, takes up a much more reasonable die space for mass production. Just as important, its release will coincide with the release of the (hopefully DDR) SDRAM Brookdale chipset, which ought to move the P4 out of the quite-high end where it will be stuck with the dual-RDRAM Tehama chipset. (3rd party DDR chipsets may be out for the P4 before then, but probably not in much volume before Q2 at the earliest.)
Is it too late?? The latest Intel roadmap shows the P4 moving to the upper end of the mainstream category in Q2, but I believe that to be a lie by Intel marketing, eager to cover up the fact that they essentially have no upper-mainstream product from now until Q3 2001, a hole in their product line a mile wide. (Am I still an Intel apologist?) Indeed, this is the reason I just bought AMD stock very recently, and have been encouraging my INTC-owning relatives to sell ever since, well as it turned out, just before the peak late this summer. And yes, like you too I am generally appalled by Intel's heavy-handed anti-consumer tactics--suing VIA and refusing to release a PC133 chipset in a lame attempt to force RDRAM down the industry's throat; paper launching the 1 GHz P3 6 months before even limited volume was available, the 700-850 MHz P3's before it around 3 months early, and the 1.13...oh the 1.13...all in a lame attempt to pretend the P6 could keep up with the Athlon; bribing Michael Dell with special pricing and all several dozen GHz P3s available this summer to spread libelous statements to the media in a lame attempt to disparage AMD's products; spreading IA-64 FUD in a lame though successful attempt to scare designers of competing RISC chips to delay (Compaq, HP) or eliminate (MIPS) their next-gen chips; keeping the Celeron FSB clocked at 66 MHz and "single-processor only" in a lame attempt to...be lame.
Don't worry, I dislike Intel plenty a lot. I cheer for AMD, and make no bones about it.
What bothers me, though, is that, having been on the Athlon bandwagon since summer 1999, when I first read analyses of how the K6's poor scaling was due to architecture not process quality, and how the better balanced K7 had the chance to scale even higher than the P3, I've seen how this position has gone from being contrarian, well-informed and far-sighted to the position of a growing mainstream of ill-informed buzzword-spouting reality-ignoring AMD fanboys. No, not you; the people I'm talking about are much much worse (and hence not nearly as able to fool /. with uninformed arguments). What's even worse, though, is that several influential tech sites employ writers not much more knowledgeable than you, and they spout the same pro-AMD propaganda day after day after day. It's not that I dislike seeing anything pro-AMD or anti-Intel; indeed, exactly the contrary. It's just that I like it to be true.
Plus, AMD's execution with the K7, while quite good, has been well short of the claims that I and many others were making for it over the past year. The benchmarks have been disappointing. There's only so much excitement you can get out of awesome benches in 3DSMax and ViewPerf before you notice those Q3 and Content Creation scores just aren't going to change. (Yes, I know CC is Intel-biased. Whatever.) Thunderbird in particular was a huge disappointment, offering gains on the order of 3-5% over Athlon Classic while the Coppermine P3 beat Katmai by 10-15% (it's that 64-bit vs. 256 bit L2 bus). MP has been MIA for months now. The K7 laptops are late as well; high power-consumption is the price you pay for unneeded FPUs.
Having read the Willamette articles I've now referred you to twice (the DeMone ones on RWT), having seen Paul defend his unorthodox position on the Ace's tech boards for months now, basically skewering even very well-informed arguments on the AMD side, I've gradually become convinced that the "web hardware community" is greatly underestimating the P4's performance. So have many people much more knowledgeable than me--including the formerly (and still, though less so, IMO) AMD-biased Johan.
I usually go around looking to argue with P4-bashers who seem intelligent and well-versed in the technology, because they give the most interesting arguments and are the most willing to learn. Unfortunately, I too often have to correct well-meaning but misleading posters like yourself, who ignorantly pass on the same-old wishful thinking and oversimple analysis as fact.
I like AMD. I really do. I want them to "win", inasmuch as I want anyone to. I really do want them to stay very very competitive, like they are now. (And to make me lots of money!) But I just don't think it's helping them, or helping the truth, to pretend that the K7, a largely derivative design, will be able to keep up with the radically innovative P4 for very long. And I don't think it's furthering the principles of beauty and elegence in design--which is what really interests me in this stuff anyways--to call an insightful and fair (I thought it strongly gave AMD the benefit of the doubt, BTW) analysis of the strengths and weaknesses of the P4 and Mustang designs "an ignorant fluffy rant", or whatever you said.
I won't expect the apology from you, but you have my email address if you should want to send it. Meanwhile, if you're really interested in MPU design, please read Paul's articles at RWT; they're fabulous and take everything to a whole new level. And if they must be anti-Intel, you can't do better than his Merced/Itanium articles, here, here, here and here.
Also you should check out the tech forum at Ace's, and the very AMD-biased but usually literate and often a great site for news and links...JC's. Plus the usual suspects: Tom's, Ars, The Register for juicy-and-occasionally-even-true rumors. You could learn a lot, and trust me, it's fascinating stuff. -
Re:You have way too much time on your hands, frien
First off, apologies for slipping in ad hominem attacks in my post. However, this was just in response to your similarly inappropriate attacks on Johan and Ace's. The difference, of course, is that my comments were in support of the correct analysis, not disparaging it.
This is not a site on which every little review and rant is meant to be posted. Hannibal's article belonged here because, well, Hannibal is an expert on the technology behind microprocessors.
1) It was Hemos' decision to post this; anyone can submit anything they deem worthwhile.
2) This was neither review or rant, but rather a lengthy and insightful look at some subtle but very important issues that will influence P4 vs. Mustang performance. Just because you've never seen anything on the web supportive of the P4 doesn't make a balanced piece a rant; it just means that you've been reading a lot of ignorant writing.
3) Humorously enough, the "self-promot ing" Hannibal link I offered was exactly "every little review", this time of some gimmicky portable (but monitor-less) PC. I found it entertaining, and was happy to see it on /., but it was the very definition of a fluff piece--like much of /., now that you mention it.
4) Hannibal is NOT an MPU expert. He himself will acknowlege this, and has in his articles (don't have time to find where). Email him yourself and ask him who is more of an expert, himself or Johan De Gelas, and I am relatively certain he'll say Johan. If not, he will readily admit that Johan is at least his equal and that Ace's is a much more technical site than Ars. And he will most certainly admit that Paul DeMone is 10 times the expert he is. Again, I really really like Hannibal's work, it honestly inspired me, and I submit every new Hannibal-on-architecture article to /. But he is just a student, not an expert.
And, just FYI, I have read every single article on microproccessor design that has passed by the /. pages for two years, plus linkage from several other sites, and a few print articles (though I no longer like to touch paper. How primitive...). I could easily look up links, hell, just by using the search features on /. and Anand.
ROFLMAO!
You read the scant handful of poorly chosen architecture articles linked from slashdot and you consider yourself an expert??? HAHAHAHAHAHA. Oh--and sometimes you check your facts with little old 16-year old Anand.
Look dude, it isn't my place to criticize you for not knowing as much about MPU design as I do. It is my place to criticize you for not realizing that there is much more to be known, for not realizing that many people do know more about it than you. I am certainly no expert--I'm just a college student--but it is blindingly clear that I know more about it than you, just as it is clear to most /.ers that they know more about computers than, say, the guy who says he needs to go out and buy more "RAMs" because the new game he just bought says it requires 250MB of free space to install. (Don't worry--you're not that bad, it was just an analogy. ;)
Second it's quite clear that you essentially skipped all the parts of the article you didn't understand and concluded that if you--with your expert education on MPU design from /. and Anandtech--didn't know what was going on, it must be "IYNSHO, fluff masquerading as technical writing". Unfortunately, your opinion, humble or no, does not apply here: it is indeed a fact that this piece contained several new insights, and synthesized information which was not easily available in other forms. This may not meet your standards of being "more than a book report," but it is certainly meets those of technical writing. Obviously Johan could not hope to benchmark the new P4 or Mustang cores, as they are not released yet; still he managed to include some insightful benchmarks which demonstrate the points he was abstractly discussing with ample clarity. (Of course, if you're used to looking at MPUs as mysterious black boxes, then you might wonder what rehashed K6 benchmarks are doing in a Mustang/P4 article.) If you truly believe that this article included "nothing unique", why don't you post just one article detailing the issues I raised in my previous post? Since you've obviously read such an article yourself, MPU expert that you are, it shouldn't be too difficult to dig up a link, even without resorting to "the search features on Anand." (LOL!)
No, Johan didn't take what might be called the "Hannibal route"--i.e. launch into an exploration of the overall design philosophies behind the two cores--because he is writing for a specific audience, a knowledgeable technical audience who can be expected to have read several pieces explaining the important design features of the P4 (not much concrete is known about the Mustang other than that it will be a K7 with tweaked layout to improve critical path and power consumption, and that it may recieve several other enhancements as speculated in the article), specifically those here, here, here, here, and here. Not only have most regular readers of Ace's read all these articles, but they have followed some very interesting debates on them between industry experts on the Ace's tech forum for months now. It might be fair to criticize Johan for submitting an article which clearly assumed such a technical background to /. (although in fairness he includes a link to his earlier, more general P4 article in the very first sentence); of course, it's /. who decides what to post on their own site, not Johan.
Re: preproduction benchmarks, the Tom's piece on the PII and the Firingsquad piece on the K7 were generally the only benchmarks available of the respective chips before their launch. If you followed MPU design news as closely as I do you would know this. There is a thing called an NDA, after all; as these two pieces demonstrate, both Intel and AMD like to make sure that those who choose to break theirs post erroneous information.
You're of course right that a PPro was indeed superior to a PII at a given clock speed; if you look through the article itself instead of just relying on the concluding quotes I posted, you would find benchmarks which clearly understate the known performance of the PII by as much as 30 or 40%, though. There is no doubt Tom's preproduction benchmarks, like Firingsquad's, were horribly off. And as long as you're disputing my "always" contention--I've ponied up the links (and no, it didn't take me very long at all, because I, having followed MPU developments for a couple years, knew for example that it was FS with the bad preproduction K7 benchmarks, and Tom with the PII controversy); why don't you post a single pre-NDA "review" or even just a series of leaked benchmarks on a new x86 core which proved entirely accurate?
Re: definition of a 7th-gen core: You really should pay more attention in class, boy, because I'm schooling you right now. I explained what I meant right after that comment in the original post: The Athlon is a new core designed to scale well to very high clockspeeds. Just like the Willammette. That's why the Willamette performs slower clock-for-clock than a theoreticl P!!! at the same clockspeed
First off, there is no evidence that the P4 has lower IPC than the P3, except for preproduction benchmarks and some ambiguous comments from Intel VPs. If you read my previous post at all, you would realize these would tend to indicate that the P4 actually has higher IPC, not lower. On the other hand, the main evidence that it has higher IPC is that an analysis of all the new, innovative, braniac features of the core strongly indicate that it must.
And second off, you couldn't be more wrong. By calling the K7 a "7th-gen core" you are obviously comparing it to the 6 previous generations of Intel cores. Each of them was able to improve both clock speed on identical process and IPC significantly over the previous generation. The Athlon beats the P6 in clock speed on identical process...but only narrowly: the Athlon sweet spot right now is around 1 GHz on Dresden's .18 Cu process; the P6 around 750 MHz on a .18 Al process. Intel's process is probably slightly better except for the large Cu vs. Al gap, so we can be charitable and say that, on identical processes, the K7 clocks 25% faster in an untweaked core than the P6 does in a much-tweaked core. Indications are that the Mustang/Palomino/Morgan K7 tweak will reach 1.5 GHz on .18 Cu, so perhaps 35-40% better on equivalent processes. As for IPC, the P6 and K7 are essentially equal. Indeed, this is being generous to the K7, as the P6 knocks it all over the place in the fairest cross-platform bench there is, SPEC. Yes, this is because Intel's in-house compiler group is better than AMD's...but the compiler is arguably just as much a part of a core as the silicon itself.
Meanwhile, P4 roadmaps indicate that it will scale 100% better than the P6 on identical processes, and the analysis of Paul DeMone, a far greater MPU expert than you or I could hope to be, is that it will have 15-20% better IPC for integer work, and considerably greater gains for FP. (It's too soon to tell without knowing more about how well compilers will optimize for SSE2.) That would be a 7th-gen core worthy of the leap from 5th to 6th which the P6 provided.
Again, don't get me wrong: the Athlon is clearly clearly superior to the Coppermine P3. But only by about the same degree as the Coppermine P3 was superior to the Katmai P3. That is, *not* by a full "generation"--whatever the hell that is.
Re: important innovations in the K7: It's a huge innovation in the x86 world, something Intel hasn't done since the PPro days. First of all, the EV6 bus is new to x86 and a huge innovation, it is superior in every way to the old Intel GTL+.
BWAA HAA HAA HAA HAA HAA!! Man, I'm rolling on the floor and crying that's so pitiful.
Oh, but I'm being rude. Ahem. Pardon me. You, uh...you do know why it's called the EV6 bus, don't you? ...Even though that name happens to be shared by the current generation of Compaq Alpha MPUs... Or wait; actually the official name of the current Alphas chips is 21264; it's just that they like to code name their core variants things like "EV6" and "EV67" and the upcoming "EV68". Based on what, again? What's that? Based on the code name of the current Alpha platform???
HAA HAA HAA HAA HAA! I asked you to pick one innovative feature of the K7, and you picked the one feature that AMD DIDN'T INVENT!!!!!
Ok, I'm over it now. Phew.
Right. AMD didn't invent the EV6 bus. They didn't help develop it. They in fact had nothing to do with it. They licensed it, wholesale, lock, stock, barrel, from Compaq where it has been in use for quite some time now. On the one hand, it was a good business decision because Intel had just clamped down and decided not to relicense the P6 bus (not really called GTL+ BTW, but don't worry it's a very common mistake) to AMD, and rather than take the time to reinvent the wheel (and thus delay the launch of the K7), AMD decided to go shopping at Compaq. Fine. Smart decision.
Don't give me any of this revisionist history that they did it because it's 200 MHz, though. The K7's extra FSB bandwidth (courtesy of the EV6 bus (and the engineers at Compaq, not AMD)) has up till now been entirely wasted as it is paired with SDR SDRAM (1.6 GB/s FSB, only 1.06 GB/s from DRAM)--generally paired asynchronously with PC133. If it were any help at all, don't you think the Athlon would be winning and not losing in FSB-intensive benchmarks like Q3? Meanwhile, it's a huge waste of pins and power--as well it should be, since it was originally designed for $10,000-50,000 workstations and servers, which, frankly, can afford the extra mobo costs, power supplies and electric bills.
Now, of course this extra FSB bandwidth will finally be put to good use with the advent of DDR mobos for the K7, *finally* starting early next month (fingers crossed!); latest news is the 1st. DDR mobos for the P3 will show less improvement because the P3 is stuck at 133 MHz FSB. Fine.
But this isn't why AMD chose the EV6. Indeed, when they made that decision, the DDR standard had either barely-just-been or had-not-yet-been determined by JEDEC. Intel was set to steamroll RDRAM into every PC, and there was little to no indictation that DDR would ever be a volume part in the PC industry. (It'd be used in servers and such.) AMD chose the EV6 because they *had* to, not because they wanted to. It's a great bus when doing what it's designed for--connecting specially made (quite expensive) double-wide SDRAM to Alphas, at FSB freqencies up to 466 MHz. But it offers little to no performance benefit in the here and now for the K7. And as for DDR and high-speed buses, Intel will be releasing their Tualatin revision P3's in Q2 with a 200 MHz FSB, in time for Almador, their (maybe--legal issues with Rambus...) DDR P3 chipset. So yes, the K7 will be first with decent DDR support in the x86 space. The P4's dual-RDRAM chipset and 3.2 GB/s FSB will be faster, though more expensive, as far as memory performance goes, though.
But calling a bus that AMD had exactly zero nada zilch nothing to do with evidence of their design innovation gets an extra HAHAHAHAHAHA from me.
More on AMD's innovative EV6 bus: it's a huge boon for multiprocessing, with the 760MP coming to retail very soon according to Anand.
Unfortunately, according to AMD's Q3 earnings report Investor Conference Call 2 weeks ago (I was listening; somehow I doubt you were...), the 760 MP has been delayed to at least Q1, possibly Q2. They played it off as strategic reasons (business demand down; no major deals with the big 4 server OEMs (Dell, IBM, HP, Compaq) for AMD in the enterprise lines), but considering they only had one 2-way system--behind closed doors and not running anything--at MPF it looks as if their engineering is behind too. On the one hand, too bad, because point-to-point beats shared bus any day. On the other, there's a reason why Intel went with shared bus, and it's not because they'd never heard of PTP. It's, well, easier to implement. When doing the right thing takes over a year longer, it sometimes becomes doing the wrong thing. (Not that I believe that's true here, but it's worth taking into consideration.)
Now, the countryside is littered with Athlon clusters crunching numbers for the scientific community in places where they'd never have considered using a P!!!.
First off, scientific computing is such a niche market as to have absolutely negligable impact on the bottom line of either company. The idea that AMD designed the K7's huge-ass FPU--thus taking up vital die-space--for the lucrative physicist market is laughable. It's an unbalanced design, plain and simple. Second, last time I checked, most scientific computing was being done either on Alphas or on Beowulf's of Celerons. Now, I don't doubt that K7's are moving heavily into the mix; if I was doing scientific computing, I would go with a cluster of Durons in a heartbeat.
But do you really, honestly, think that when AMD decided to go with the 3-wide FPU there were dreams of meteorology and electron potential modeling spinning in their heads? Me either.
And your "analysis" of the supposed advantages of the Coppermine's cache over the Thunderbird's are positively laughable. You see, in the REAL WORLD people don't run benchmarks on their boxes all day. They run apps and processes, usually several at a time. That's why the Athlon's cache is superior--you can keep more in it instead of swapping to system RAM, which is a MUCH BIGGER HIT than having a small amount of cache latency. In the REAL WORLD, the Athlon's cache architecture makes sense, not in your fantasy where we all run CPUmark all day.
Uhhuh. That's why the Katmai P3--with its half-speed 512 Kb L2--was so much faster than the Coppermine? That's why the Athlon "Classic"--with down to 1/3-speed 512Kb L2--is so much better than TBird??
You think it's faster to perform a context switch with a 64-bit bus to L2 than a 256-bit one? Golly, imagine how slow the P4 with its 48 GB/s bus to the 5-cycle latency L2 will be!!
Furthermore, in case you'd forgotten, all these chips operate at over one billion cycles per second. Multitasking occurs at much higher granularity than this, and even if your analysis were right (it's not), the effects of multitasking are invisible to a chip to a second or third order of approximation. The effects of a 7 (or 5!) cycle L2 vs. an 11 cycle one most certainly are not.
And then you go on to say that the K8 will just be a derivative of the K7. You really aren't paying attention at all. The K8 Hammer architecture is completely new, not only extending x86 to true 64-bit while retaining backwards-compatibility with 32-bit and 16-bit code, but adding huge and significant architectural innovations. Go read about it, dimwit, before you guess at what it is. Lots of documentation has been released--even just a quick scan of some Slashdot search results will make you a lot more knowledgeable about it than you are now. Geez...
No, this is false. It is by now quite well known that what will define the Hammer family will be just a simple extension of the x86 ISA to 64 bits--an extension which will have essentially no use for the average PC user, but rather only for those who need 64-bit integer precision (CAD, etc.) or >32-bit memory address space (database, etc.). In addition, "the K8"--that is, the Sledgehammer, aimed at the enterprise market--will feature 2-way CMP and AMD's new Southbridge standard, LDT. Ho-hum. Nice features (LDT has no place in the PC either, though), but nothing extraordinary, especially considering it's not due until early 2002. In addition, there has been mention (Sanders mentioned it in an interview) of another K8 variant called Clawhammer; speculation is that this is a PC version of the K8, although it's not known what, other than x86-64, will differentiate it from, say, Palomino.
If you actually believe the K8 represents an entirely new design, then it may be that your news was correct but just a little (about 12 months) late. The K8 was indeed scheduled to be a ground-up clean-sheet kick-butt design, but was radically scaled back by Sanders less than a year ago. Head Designer and impressive guy Atiq Raza quit around a year ago, following this decision, and the hopes of a truly innovative K8 went with him. Of course, evolution is often better than revolution in the MPU industry--eg. RDRAM. If it can manage to position itself against Itanium, the K8 might look very strong. (Of course, McKinley will be on the way by then, and it's considerably less of a joke.) But claiming that the K8 is a revolutionary new design is plain false.
As for the recent /. articles on the K8, they have all been, IIRC, about the recently released x86-64 simulator to help Linux, etc. port to the new ISA. This has nothing whatsoever to do with the design of the K8 itself--an x86 simulator could help "port" Linux to either a P5 or a P6, although they could not be more different architecturally--just the ISA.
Re: P4 moving into heavy volume in Q3 2001: God, you are a shameless, and dim-witted, Intel apologist, just as I suspected. Q3 2001?
How precisely does this make me an Intel apologist? Is it too early?? It is a known fact that Intel's roadmap moves the P4 solidly into the mainstream category in Q3 '01 with the introduction of the Northwood P4 on a .13 Cu process. Northwood will allow Intel to get good yields at >2.0 GHz, and, more importantly, takes up a much more reasonable die space for mass production. Just as important, its release will coincide with the release of the (hopefully DDR) SDRAM Brookdale chipset, which ought to move the P4 out of the quite-high end where it will be stuck with the dual-RDRAM Tehama chipset. (3rd party DDR chipsets may be out for the P4 before then, but probably not in much volume before Q2 at the earliest.)
Is it too late?? The latest Intel roadmap shows the P4 moving to the upper end of the mainstream category in Q2, but I believe that to be a lie by Intel marketing, eager to cover up the fact that they essentially have no upper-mainstream product from now until Q3 2001, a hole in their product line a mile wide. (Am I still an Intel apologist?) Indeed, this is the reason I just bought AMD stock very recently, and have been encouraging my INTC-owning relatives to sell ever since, well as it turned out, just before the peak late this summer. And yes, like you too I am generally appalled by Intel's heavy-handed anti-consumer tactics--suing VIA and refusing to release a PC133 chipset in a lame attempt to force RDRAM down the industry's throat; paper launching the 1 GHz P3 6 months before even limited volume was available, the 700-850 MHz P3's before it around 3 months early, and the 1.13...oh the 1.13...all in a lame attempt to pretend the P6 could keep up with the Athlon; bribing Michael Dell with special pricing and all several dozen GHz P3s available this summer to spread libelous statements to the media in a lame attempt to disparage AMD's products; spreading IA-64 FUD in a lame though successful attempt to scare designers of competing RISC chips to delay (Compaq, HP) or eliminate (MIPS) their next-gen chips; keeping the Celeron FSB clocked at 66 MHz and "single-processor only" in a lame attempt to...be lame.
Don't worry, I dislike Intel plenty a lot. I cheer for AMD, and make no bones about it.
What bothers me, though, is that, having been on the Athlon bandwagon since summer 1999, when I first read analyses of how the K6's poor scaling was due to architecture not process quality, and how the better balanced K7 had the chance to scale even higher than the P3, I've seen how this position has gone from being contrarian, well-informed and far-sighted to the position of a growing mainstream of ill-informed buzzword-spouting reality-ignoring AMD fanboys. No, not you; the people I'm talking about are much much worse (and hence not nearly as able to fool /. with uninformed arguments). What's even worse, though, is that several influential tech sites employ writers not much more knowledgeable than you, and they spout the same pro-AMD propaganda day after day after day. It's not that I dislike seeing anything pro-AMD or anti-Intel; indeed, exactly the contrary. It's just that I like it to be true.
Plus, AMD's execution with the K7, while quite good, has been well short of the claims that I and many others were making for it over the past year. The benchmarks have been disappointing. There's only so much excitement you can get out of awesome benches in 3DSMax and ViewPerf before you notice those Q3 and Content Creation scores just aren't going to change. (Yes, I know CC is Intel-biased. Whatever.) Thunderbird in particular was a huge disappointment, offering gains on the order of 3-5% over Athlon Classic while the Coppermine P3 beat Katmai by 10-15% (it's that 64-bit vs. 256 bit L2 bus). MP has been MIA for months now. The K7 laptops are late as well; high power-consumption is the price you pay for unneeded FPUs.
Having read the Willamette articles I've now referred you to twice (the DeMone ones on RWT), having seen Paul defend his unorthodox position on the Ace's tech boards for months now, basically skewering even very well-informed arguments on the AMD side, I've gradually become convinced that the "web hardware community" is greatly underestimating the P4's performance. So have many people much more knowledgeable than me--including the formerly (and still, though less so, IMO) AMD-biased Johan.
I usually go around looking to argue with P4-bashers who seem intelligent and well-versed in the technology, because they give the most interesting arguments and are the most willing to learn. Unfortunately, I too often have to correct well-meaning but misleading posters like yourself, who ignorantly pass on the same-old wishful thinking and oversimple analysis as fact.
I like AMD. I really do. I want them to "win", inasmuch as I want anyone to. I really do want them to stay very very competitive, like they are now. (And to make me lots of money!) But I just don't think it's helping them, or helping the truth, to pretend that the K7, a largely derivative design, will be able to keep up with the radically innovative P4 for very long. And I don't think it's furthering the principles of beauty and elegence in design--which is what really interests me in this stuff anyways--to call an insightful and fair (I thought it strongly gave AMD the benefit of the doubt, BTW) analysis of the strengths and weaknesses of the P4 and Mustang designs "an ignorant fluffy rant", or whatever you said.
I won't expect the apology from you, but you have my email address if you should want to send it. Meanwhile, if you're really interested in MPU design, please read Paul's articles at RWT; they're fabulous and take everything to a whole new level. And if they must be anti-Intel, you can't do better than his Merced/Itanium articles, here, here, here and here.
Also you should check out the tech forum at Ace's, and the very AMD-biased but usually literate and often a great site for news and links...JC's. Plus the usual suspects: Tom's, Ars, The Register for juicy-and-occasionally-even-true rumors. You could learn a lot, and trust me, it's fascinating stuff. -
Re:Intel is going the clock speed route for a reasThe new IBM POWER4 chip (a great article on the chip is here) looks really great but given the trouble that other manufacturers have had with complex chip designs I have to wonder.
- How well will it scale to high clock speeds?
- How are chip yields going to be affected by the super-complex chip design (and of course, low chip yields = high price)Still looks good though, too bad it'll be about 50 years before I can afford to have one on my desktop.
-
Re:This isn't a discussion about design philosophyUgh. Ignorant crap getting a +4 insightful. Well, let's get this over with...
Rather, it is a piece of self-promotion by Ace's Hardware, who sent this story in themselves.
Many websites send notices of their original content to each other, especially when they know that it is excellent content, like this article. ArsTechnica sends notices both to Ace's and to /. Here is an example of exactly the same brand of "self-promotion" from Hannibal, and as regarding a (IMO) far less worthy though still interesting article.
The article itself doesn't say anything the knowledgeable don't already know.
This is false. I am a hell of a lot more knowledgeable in matters of MPU architecture than you, and I learned quite a bit. But I suppose you were already an expert on the intricacies of load-store reordering on the P6 vs. the K7, on the precise weaknesses of the K7's branch prediction algorithm (i.e. that it throws an exception and flushes its BTB when presented with more than two branches in a 16-byte aligned code window), on the dependancy scheduling problems of very large instruction reorder buffers and what they imply about the P4's clock-speed ramp. I suppose you'd already seen benchmarks which measured the effects of L2 latency and branch prediction on IPC. (You wouldn't mind posting a link, would you troll?)
In fact, it reads like a high-school report, and not even a very well-written one. E.g., "First we will try to analyze the most important shortcomings, next we will search for possible solutions." Sounds just like the simplistic expositions of a high school term paper.
Way to go, asshole. The author's name is Johan De Gelas. He lives in the Netherlands. ENGLISH IS NOT HIS NATIVE LANGUAGE. I'd like to see you post a single sentence in Danish, much less an incredibly insightful article on competing philosophies in next-generation 1.5 GHz+ MPU design.
Look, I know that there is a lot of mumbo-jumbo laden "technical" architecture discussion going around the web, often quite nonsensical and written by good-old fashioned Americans who just haven't had the benefit of 8th grade grammar (or a solid education in MPU design). The point is, you were horribly wrong to lump this article in with that schlock, and you apparently did so only because it contained terms and explanations which you didn't understand. Furthermore, you made your point, with quite authoritative tone, in a public forum. Of course you have every right to be loud and wrong in /. Indeed, I've been known to be loud and wrong in /. several times before. Still, if you don't know what you're talking about, please please please don't talk.
I repeat: the article is not a technical piece at all. Hannibal at ArsTechnica writes technical pieces about CPU design. This article at Ace's Hardware says nothing insightful.
Completely backwards. Now, let me first say that I not only respect Hannibal tremendously, but that his articles (particularly the excellent RISC vs. CISC in the Post-RISC era) were what inspired me, a bit over a year ago, to begin to learn much more about MPU architecture and design. They are written very vividly, with strong prose and excellent, clear analogies. They do a fabulous job of explaining complicated concepts and new trends in MPU design to a lay reader.
ArsTechnica, like /., is a general-purpose tech site. Ace's Hardware is all about hardware, mainly MPU design and architecture. Indeed, it is perhaps the most respected daily-updated MPU architecture site on the web. Several experts--many very well informed amateurs, many who work in the industry--post in their technical forum. We're talking people like Aaron Spink, MPU designer for Compaq, who works on what is generally acknowledged to be the best MPU design team on the planet (the Alpha). We're also talking people like Paul DeMone, designer for MOSAID, who in his free time writes IMNSHO the best technical series of design articles available for free, including this excellent article which destroyed one of Hannibal's fundamental premises in that Post-RISC article I loved so much. And indeed, Hannibal immediately posted a link to the article and said as much. That's because, as great a service as he provides--and I really, really love Hannibal's articles and they're the first thing I recommend to anyone interested in learning about MPU design--they are *not* technical, they often miss important points which an experienced professional would not (as in this case), and Hannibal is just a student with the benefit of a few architecture classes and a well-worn copy of Hennessy and Patterson.
So by all means, people--if you're reading this and want to learn about the fascinating world of MPU design, start with Hannibal. But just know that his articles, while very good, are *not* technical; when you want technical, a great place to start is Ace's.
Now that we're through with that bit of unpleasantness, let's clean up your misstatements, shall we?
In fact, it misses the point. It dares to call the P4 "innovative" and wonder whether future designs in the x86 world will copy it. Well, of course not! How many times must it be said that the P4 barely keeps up with the Athlon and performs less well than a P!!!? Because, that is a fact. Numerous production samples have leaked, with the test results uniformly and without exception pointing to the fact that even if the platform's performance is improved by release time--which it should, since these are samples not a retail product--it won't outperform a P!!! with equal clockspeed. That's why the P4 is being released at 1.4 and 1.5GHz initially, because if they were released at 1.2GHz they'd be outperformed by the 1GHz P!!! and that wouldn't be good.
Oh really. Just like preproduction benchmarks of the K7 proved it to be "closer to that of a Celeron 366 than any Pentium III." Just like preproduction benchmarks of the PII lead to the following insightful comments from Tom's Hardware (a leader in the "P4 is overhyped, clock-speed isn't everything, blah blah blah" ignorance these days...):
Well, the beef with the Pentium II is that it seems to suffer from BSE (bovine spongiform encephelephy a.k.a. Mad Cow Disease), although I doubt that any British cattle was involved. Although BSE infected products shouldn't be imported, I'm pretty sure we'll also see the Pentium II here in Europe soon after the 3rd of May when it is finally released. However, since I wouldn't eat BSE infected beef, I wouldn't be interested in risking an infection of my computer with this CPU either.
...For former Pentium users there's hardly any attractiveness in the Pentium II either. The Windows 95 performance is hardly any better and in some cases even worse than the cheaper Pentium Pro or Pentium MMX. Windows NT users would be the last ones to be interested in the Pentium II, there is just no reason at all to swap the Pentium Pro for a Pentium II.
Guess what: preproduction benchmarks are always wrong. Again, preproduction benchmarks are always wrong. And in particular, the benchmarks we've seen on those preproduction P4's are--just like the benchmarks included in the articles above (i.e. the K7 scoring only 60% of a clock-normalized PIII on FPUMark; the PII doing worse on 32-bit code than a P5-MMX)--utter nonsense given what we know about the P4's design . Thus the logical conclusion is that, just like the preproduction MPU's "benchmarked" above (and let me remind you that those were at least close enough to final silicon to be clocked at release-ready clock speeds), the P4's we have seen "benchmarked" on the web so far have been sandbagged.
Now, the common reaction to these charges goes something like this: "Sandbagged? Impossible! After all, these P4's are at most one stepping from final silicon, maybe even final silicon! Thus they can't be sandbagged!" Which is utterly false. Obviously the sandbagging isn't done in the chip design--that would be idiotic. Rather, it is done in microcode. Every feature of the chip can be turned on and off, tuned and detuned, in microcode. Thus it is trivial to ship a preproduction MPU off for validation with, for example, part of the L2 cache disabled, or the BTB or instruction reorder buffers set to flush when they don't need to, or the way prediction on the two-cycle L1 cache turned off, or tuned wrong, or with certain x86 instructions mapped to unnecessarily slow circuit paths, or any of dozens and dozens of different things set wrong. Indeed, this is the common state of internal preproduction MPUs, because the only way to test corner cases and pathological cases is by disabling one part of the chip and thus placing unrealistic stress on another. In other words, preproduction chips are sort of like beta software--full of DEBUG code which slows everything down, but isn't worth taking out until you're sure everything works.
"But," you may say, "why would Intel sandbag their preproduction P4's when they know benchmarks will leak out?? Why not build up the hype and all that??" The answer, again, is simple. If you take a look at Intel's history of dealing with prerelease cores, you find that they only hype the projects which are likely to underperform horribly--the i860, the iAPX432, Itanium--and they significantly underplay the ones which are going to kick major booty--eg. the P6 core and now the P4. "But why???" Easy. If Intel has a project which sucks, the best they can hope for is to scare off their potential competitors from the market space until they can get another crack at it. (Remember, there's a 3-or-more year lag-time between the decision to start--or not start--a project and the finished product.) That's exactly what they've done with Itanium, scaring MIPS out of the high-end RISC business, and putting Compaq and HP years behind on their high-end RISC designs, with nothing but a bunch of IA-64 FUD. Meanwhile, if their upcoming core is going to perform incredibly, why waste time hyping and giving your competitors the tip-off?? All that would do is cannibalize the sales of your current MPUs as people wait to get the amazing new chip due out in 6 months. Worse, if Intel hyped the great performance of the upcoming P4, they would need to admit that the average PC user can actually use 1 GHz+ performance...which, of course, would play right into the hands of AMD which is the only player with decent 1GHz+ volume until well into next year. This way, you get to surprise the industry, get great press, and sell off way more of your old, now obsolete chips. Simple, really.
Now, the P4 barely keeps up with the current-generation Athlon Thunderbirds. This is important to note because people always *blamed* AMD for a processor which still, with the advantages of the P!!! SIMD intruction optimizations used in much software, didn't quite keep pace with Intel's offering in the most common benchmarks. Now, the technically knowledgeable know that the Athlon whomps the P!!! in anything that isn't SIMDified, and that its floating point unit is head-and-shoulders above. But people still moaned about the performance gap in certain common SIMDified benchmarks.
Wrong, wrong, wrong. The only cases in which the Athlon clearly bests a Coppermine P3 is in scientific (i.e. double-precision) FPU-heavy simulations, ray tracing, etc. On almost every other benchmark, they are within +/-5% at identical clock speeds, with a few standouts at around +/-8% for each architecture. In particular, 3D games tend to show an affinity for the Coppermine. Blaming this on some "SIMD bogeyman" is ridiculous--every 3D game, and especially a standout game like Quake 3, is optimized for 3DNow just as it is for SSE. Now, you can either deny the facts, or you can try to understand them.
The main culprit, of course, is the difference in L2 latencies. Tbird has a 64-bit bus to L2 at a latency of 11 clock cycles, with 384Kb total cache; Coppermine has a 256-bit bus to L2 at a latency of 7 clock cycles with 256Kb total cache. The Tbird has the bigger cache because the cache design is exclusive; however, it also has much longer latencies for this and other reasons. In the end, there is no comparison as to which is the better design--the Coppermine's cache hierarchy is simply better than the TBird's, no argument about it. And Johan's benchmarks illustrate this rather nicely.
Well, here's what they didn't realize: the Athlon is a truly seventh-generation core--which beat Intel to the punch by, what, almost a year and a half? As such, it has made trade-offs to be able to scale to higher clockspeeds better--one reason why Intel had to recall, and still hasn't re-issued, the 1.13GHz P!!! yet AMD are easily churning out 1.2GHz Athlon Thunderbirds.
"The Athlon is truly a seventh-generation core." What does that mean??? If you think it means the K7 core has one single architectural innovation which does not exist on an MPU available before it, then I challenge you to list it now. (Indeed, I can't think of a single innovation in the K7 which isn't in the P6 core--except for the exclusive cache architecture, which is an overall weakness compared to the Coppermine cache--but there may be some.) If you think it means the K7 is a better core than the P6, well, you're right. The K7 is indeed a better core, in that its pipeline stages are more evenly balanced, and thus it can scale to higher clockspeeds on similar process. On the other hand, the K7 is less well balanced from an execution resources standpoint, including such oafish features as a fully 3-wide FPU (as opposed to the P6's 1.5-wide FPU), which offers at best 40% better performance, but generally no better performance than the P6 on FP intensive apps. Yes, the reason for the discrepancy is partly due to code which is compiled with the P6's execution resources in mind--but of course, that will continue to be most things so long as Intel has the majority of market share (AMD currently sells out all the MPUs it can make and thus has no theoretical way of getting majority market share for at least the next 4 years or so), and most apps are precompiled binary. But it's partly due to the fact that there's just not enough need for 3 full FPUs to justify the die space they take. This is just one example, but the end result is that the K7 is a well-balanced core pipeline-wise which is larger and consumes more power than it can justify based on its ability to get instructions from cache and memory. It is still the fastest thing out there, but it uses brute force to make it there. Time-to-market issues are behind some of these design issues, and some of those will be solved with the upcoming Mustang/Palomino/Morgan core tweak. But that still won't make the K7 anything more than a rebalanced tweaked-out brute-force of a P6. And hey--that ain't bad. But it ain't innovation.
The P4, on the other hand, includes many features never before seen on a commercial MPU. They include: double-pumped ALU, integer decoder and scheduler, and integer retiring (running at up to 4 GHz on a .18 process!!!); trace cache; two-cycle L1 potentially using way-prediction to reach 2.0 GHz on a .18 process; hardware prefetch; and, well, a pipeline deep enough to allow 2.0 GHz on a .18 process. It also includes some impressive resources never before seen on the x86 side of things. They include: 126 op buffer; 3.2 GB/s-4.27 Gb/s FSB; "most accurate branch prediction algorithm ever" (claimed by Intel at MPF a couple weeks ago); 48 GB/s L2->core bandwidth; and SSE2, which will finally let the x86 push double-precision FP code with the big boys, and doesn't resort to a kludgy, die-space-wasting, gas-guzzling halfway-solution like the K7's triple FPU. On the downside there is the branch misprediction penalty of 19 clocks, potentially 27 if the code is not in the trace cache (unlikely). However, even this is mitigated by the fact that while the official branch mispredict penalty of the P6, for example, is a mere 12 clocks IIRC, the actual time to execute new code on a mispredict is more in the neighborhood of 30-50 clocks, because the instructions need to be rescheduled. Meanwhile, the P4 has wider scheduling resources, and thus may not even have a higher branch mispredict penalty in practice at all. It will certainly have many fewer mispredicts, so the overall analysis here is probably a wash.
It is, all-in-all, a very impressive looking chip, more than worthy of the title "seventh generation", whether it turns out to perform well or poorly. However, meaningless sandbagged benchmarks aside, all indications are that it will perform magnificantly. Taken as a whole, the P4 contains not only the sorts of design changes necessary to *double* clock speed on a given process over the P6 (note:WOW), but also *increase* IPC. But we'll see how this beautiful looking design translates to reality when the first actual P4's are released and benchmarked.
Blah blah blah, biased statements towards Ace's.
Ace's is in general a slightly AMD-biased site. "Unfortunately", Johan, Brian, and the rest of the crew there "have to" read the thoughts of actual MPU experts day in and day out in their technical forum, and thus know that the case for the K7--and against the P4--is not what the average hardware site has made it out to be. This is not to take anything away from AMD, which has at the moment by far and away the fastest performing MPUs on the planet, the best binsplits on the planet, and about 1.4x the performance/price of Intel all the way up and down their price lists. However, all appearances are that, once the P4 moves into heavy volume production (note: not until Q3 next year at the earliest, after a process shrink to .13 Cu), Intel will have a very strong and competitive lineup. And that until then, while AMD ought to be the choice of every sane computer buyer around, Intel will have bragging rights for the highest-performing (not just highest-clocking) chip in the x86 space, if not in the world. Furthermore, with the K8 almost certain to be just a derivative of the K7 (probably with 64-bit extensions and 2-way CMP), it looks as if Intel will take back the clock-speed crown and hold it for good. Whether that means it will win the performance crown for good remains to be seen, but I certainly wouldn't discount the P4 core if I were you. -
Re:This isn't a discussion about design philosophyUgh. Ignorant crap getting a +4 insightful. Well, let's get this over with...
Rather, it is a piece of self-promotion by Ace's Hardware, who sent this story in themselves.
Many websites send notices of their original content to each other, especially when they know that it is excellent content, like this article. ArsTechnica sends notices both to Ace's and to /. Here is an example of exactly the same brand of "self-promotion" from Hannibal, and as regarding a (IMO) far less worthy though still interesting article.
The article itself doesn't say anything the knowledgeable don't already know.
This is false. I am a hell of a lot more knowledgeable in matters of MPU architecture than you, and I learned quite a bit. But I suppose you were already an expert on the intricacies of load-store reordering on the P6 vs. the K7, on the precise weaknesses of the K7's branch prediction algorithm (i.e. that it throws an exception and flushes its BTB when presented with more than two branches in a 16-byte aligned code window), on the dependancy scheduling problems of very large instruction reorder buffers and what they imply about the P4's clock-speed ramp. I suppose you'd already seen benchmarks which measured the effects of L2 latency and branch prediction on IPC. (You wouldn't mind posting a link, would you troll?)
In fact, it reads like a high-school report, and not even a very well-written one. E.g., "First we will try to analyze the most important shortcomings, next we will search for possible solutions." Sounds just like the simplistic expositions of a high school term paper.
Way to go, asshole. The author's name is Johan De Gelas. He lives in the Netherlands. ENGLISH IS NOT HIS NATIVE LANGUAGE. I'd like to see you post a single sentence in Danish, much less an incredibly insightful article on competing philosophies in next-generation 1.5 GHz+ MPU design.
Look, I know that there is a lot of mumbo-jumbo laden "technical" architecture discussion going around the web, often quite nonsensical and written by good-old fashioned Americans who just haven't had the benefit of 8th grade grammar (or a solid education in MPU design). The point is, you were horribly wrong to lump this article in with that schlock, and you apparently did so only because it contained terms and explanations which you didn't understand. Furthermore, you made your point, with quite authoritative tone, in a public forum. Of course you have every right to be loud and wrong in /. Indeed, I've been known to be loud and wrong in /. several times before. Still, if you don't know what you're talking about, please please please don't talk.
I repeat: the article is not a technical piece at all. Hannibal at ArsTechnica writes technical pieces about CPU design. This article at Ace's Hardware says nothing insightful.
Completely backwards. Now, let me first say that I not only respect Hannibal tremendously, but that his articles (particularly the excellent RISC vs. CISC in the Post-RISC era) were what inspired me, a bit over a year ago, to begin to learn much more about MPU architecture and design. They are written very vividly, with strong prose and excellent, clear analogies. They do a fabulous job of explaining complicated concepts and new trends in MPU design to a lay reader.
ArsTechnica, like /., is a general-purpose tech site. Ace's Hardware is all about hardware, mainly MPU design and architecture. Indeed, it is perhaps the most respected daily-updated MPU architecture site on the web. Several experts--many very well informed amateurs, many who work in the industry--post in their technical forum. We're talking people like Aaron Spink, MPU designer for Compaq, who works on what is generally acknowledged to be the best MPU design team on the planet (the Alpha). We're also talking people like Paul DeMone, designer for MOSAID, who in his free time writes IMNSHO the best technical series of design articles available for free, including this excellent article which destroyed one of Hannibal's fundamental premises in that Post-RISC article I loved so much. And indeed, Hannibal immediately posted a link to the article and said as much. That's because, as great a service as he provides--and I really, really love Hannibal's articles and they're the first thing I recommend to anyone interested in learning about MPU design--they are *not* technical, they often miss important points which an experienced professional would not (as in this case), and Hannibal is just a student with the benefit of a few architecture classes and a well-worn copy of Hennessy and Patterson.
So by all means, people--if you're reading this and want to learn about the fascinating world of MPU design, start with Hannibal. But just know that his articles, while very good, are *not* technical; when you want technical, a great place to start is Ace's.
Now that we're through with that bit of unpleasantness, let's clean up your misstatements, shall we?
In fact, it misses the point. It dares to call the P4 "innovative" and wonder whether future designs in the x86 world will copy it. Well, of course not! How many times must it be said that the P4 barely keeps up with the Athlon and performs less well than a P!!!? Because, that is a fact. Numerous production samples have leaked, with the test results uniformly and without exception pointing to the fact that even if the platform's performance is improved by release time--which it should, since these are samples not a retail product--it won't outperform a P!!! with equal clockspeed. That's why the P4 is being released at 1.4 and 1.5GHz initially, because if they were released at 1.2GHz they'd be outperformed by the 1GHz P!!! and that wouldn't be good.
Oh really. Just like preproduction benchmarks of the K7 proved it to be "closer to that of a Celeron 366 than any Pentium III." Just like preproduction benchmarks of the PII lead to the following insightful comments from Tom's Hardware (a leader in the "P4 is overhyped, clock-speed isn't everything, blah blah blah" ignorance these days...):
Well, the beef with the Pentium II is that it seems to suffer from BSE (bovine spongiform encephelephy a.k.a. Mad Cow Disease), although I doubt that any British cattle was involved. Although BSE infected products shouldn't be imported, I'm pretty sure we'll also see the Pentium II here in Europe soon after the 3rd of May when it is finally released. However, since I wouldn't eat BSE infected beef, I wouldn't be interested in risking an infection of my computer with this CPU either.
...For former Pentium users there's hardly any attractiveness in the Pentium II either. The Windows 95 performance is hardly any better and in some cases even worse than the cheaper Pentium Pro or Pentium MMX. Windows NT users would be the last ones to be interested in the Pentium II, there is just no reason at all to swap the Pentium Pro for a Pentium II.
Guess what: preproduction benchmarks are always wrong. Again, preproduction benchmarks are always wrong. And in particular, the benchmarks we've seen on those preproduction P4's are--just like the benchmarks included in the articles above (i.e. the K7 scoring only 60% of a clock-normalized PIII on FPUMark; the PII doing worse on 32-bit code than a P5-MMX)--utter nonsense given what we know about the P4's design . Thus the logical conclusion is that, just like the preproduction MPU's "benchmarked" above (and let me remind you that those were at least close enough to final silicon to be clocked at release-ready clock speeds), the P4's we have seen "benchmarked" on the web so far have been sandbagged.
Now, the common reaction to these charges goes something like this: "Sandbagged? Impossible! After all, these P4's are at most one stepping from final silicon, maybe even final silicon! Thus they can't be sandbagged!" Which is utterly false. Obviously the sandbagging isn't done in the chip design--that would be idiotic. Rather, it is done in microcode. Every feature of the chip can be turned on and off, tuned and detuned, in microcode. Thus it is trivial to ship a preproduction MPU off for validation with, for example, part of the L2 cache disabled, or the BTB or instruction reorder buffers set to flush when they don't need to, or the way prediction on the two-cycle L1 cache turned off, or tuned wrong, or with certain x86 instructions mapped to unnecessarily slow circuit paths, or any of dozens and dozens of different things set wrong. Indeed, this is the common state of internal preproduction MPUs, because the only way to test corner cases and pathological cases is by disabling one part of the chip and thus placing unrealistic stress on another. In other words, preproduction chips are sort of like beta software--full of DEBUG code which slows everything down, but isn't worth taking out until you're sure everything works.
"But," you may say, "why would Intel sandbag their preproduction P4's when they know benchmarks will leak out?? Why not build up the hype and all that??" The answer, again, is simple. If you take a look at Intel's history of dealing with prerelease cores, you find that they only hype the projects which are likely to underperform horribly--the i860, the iAPX432, Itanium--and they significantly underplay the ones which are going to kick major booty--eg. the P6 core and now the P4. "But why???" Easy. If Intel has a project which sucks, the best they can hope for is to scare off their potential competitors from the market space until they can get another crack at it. (Remember, there's a 3-or-more year lag-time between the decision to start--or not start--a project and the finished product.) That's exactly what they've done with Itanium, scaring MIPS out of the high-end RISC business, and putting Compaq and HP years behind on their high-end RISC designs, with nothing but a bunch of IA-64 FUD. Meanwhile, if their upcoming core is going to perform incredibly, why waste time hyping and giving your competitors the tip-off?? All that would do is cannibalize the sales of your current MPUs as people wait to get the amazing new chip due out in 6 months. Worse, if Intel hyped the great performance of the upcoming P4, they would need to admit that the average PC user can actually use 1 GHz+ performance...which, of course, would play right into the hands of AMD which is the only player with decent 1GHz+ volume until well into next year. This way, you get to surprise the industry, get great press, and sell off way more of your old, now obsolete chips. Simple, really.
Now, the P4 barely keeps up with the current-generation Athlon Thunderbirds. This is important to note because people always *blamed* AMD for a processor which still, with the advantages of the P!!! SIMD intruction optimizations used in much software, didn't quite keep pace with Intel's offering in the most common benchmarks. Now, the technically knowledgeable know that the Athlon whomps the P!!! in anything that isn't SIMDified, and that its floating point unit is head-and-shoulders above. But people still moaned about the performance gap in certain common SIMDified benchmarks.
Wrong, wrong, wrong. The only cases in which the Athlon clearly bests a Coppermine P3 is in scientific (i.e. double-precision) FPU-heavy simulations, ray tracing, etc. On almost every other benchmark, they are within +/-5% at identical clock speeds, with a few standouts at around +/-8% for each architecture. In particular, 3D games tend to show an affinity for the Coppermine. Blaming this on some "SIMD bogeyman" is ridiculous--every 3D game, and especially a standout game like Quake 3, is optimized for 3DNow just as it is for SSE. Now, you can either deny the facts, or you can try to understand them.
The main culprit, of course, is the difference in L2 latencies. Tbird has a 64-bit bus to L2 at a latency of 11 clock cycles, with 384Kb total cache; Coppermine has a 256-bit bus to L2 at a latency of 7 clock cycles with 256Kb total cache. The Tbird has the bigger cache because the cache design is exclusive; however, it also has much longer latencies for this and other reasons. In the end, there is no comparison as to which is the better design--the Coppermine's cache hierarchy is simply better than the TBird's, no argument about it. And Johan's benchmarks illustrate this rather nicely.
Well, here's what they didn't realize: the Athlon is a truly seventh-generation core--which beat Intel to the punch by, what, almost a year and a half? As such, it has made trade-offs to be able to scale to higher clockspeeds better--one reason why Intel had to recall, and still hasn't re-issued, the 1.13GHz P!!! yet AMD are easily churning out 1.2GHz Athlon Thunderbirds.
"The Athlon is truly a seventh-generation core." What does that mean??? If you think it means the K7 core has one single architectural innovation which does not exist on an MPU available before it, then I challenge you to list it now. (Indeed, I can't think of a single innovation in the K7 which isn't in the P6 core--except for the exclusive cache architecture, which is an overall weakness compared to the Coppermine cache--but there may be some.) If you think it means the K7 is a better core than the P6, well, you're right. The K7 is indeed a better core, in that its pipeline stages are more evenly balanced, and thus it can scale to higher clockspeeds on similar process. On the other hand, the K7 is less well balanced from an execution resources standpoint, including such oafish features as a fully 3-wide FPU (as opposed to the P6's 1.5-wide FPU), which offers at best 40% better performance, but generally no better performance than the P6 on FP intensive apps. Yes, the reason for the discrepancy is partly due to code which is compiled with the P6's execution resources in mind--but of course, that will continue to be most things so long as Intel has the majority of market share (AMD currently sells out all the MPUs it can make and thus has no theoretical way of getting majority market share for at least the next 4 years or so), and most apps are precompiled binary. But it's partly due to the fact that there's just not enough need for 3 full FPUs to justify the die space they take. This is just one example, but the end result is that the K7 is a well-balanced core pipeline-wise which is larger and consumes more power than it can justify based on its ability to get instructions from cache and memory. It is still the fastest thing out there, but it uses brute force to make it there. Time-to-market issues are behind some of these design issues, and some of those will be solved with the upcoming Mustang/Palomino/Morgan core tweak. But that still won't make the K7 anything more than a rebalanced tweaked-out brute-force of a P6. And hey--that ain't bad. But it ain't innovation.
The P4, on the other hand, includes many features never before seen on a commercial MPU. They include: double-pumped ALU, integer decoder and scheduler, and integer retiring (running at up to 4 GHz on a .18 process!!!); trace cache; two-cycle L1 potentially using way-prediction to reach 2.0 GHz on a .18 process; hardware prefetch; and, well, a pipeline deep enough to allow 2.0 GHz on a .18 process. It also includes some impressive resources never before seen on the x86 side of things. They include: 126 op buffer; 3.2 GB/s-4.27 Gb/s FSB; "most accurate branch prediction algorithm ever" (claimed by Intel at MPF a couple weeks ago); 48 GB/s L2->core bandwidth; and SSE2, which will finally let the x86 push double-precision FP code with the big boys, and doesn't resort to a kludgy, die-space-wasting, gas-guzzling halfway-solution like the K7's triple FPU. On the downside there is the branch misprediction penalty of 19 clocks, potentially 27 if the code is not in the trace cache (unlikely). However, even this is mitigated by the fact that while the official branch mispredict penalty of the P6, for example, is a mere 12 clocks IIRC, the actual time to execute new code on a mispredict is more in the neighborhood of 30-50 clocks, because the instructions need to be rescheduled. Meanwhile, the P4 has wider scheduling resources, and thus may not even have a higher branch mispredict penalty in practice at all. It will certainly have many fewer mispredicts, so the overall analysis here is probably a wash.
It is, all-in-all, a very impressive looking chip, more than worthy of the title "seventh generation", whether it turns out to perform well or poorly. However, meaningless sandbagged benchmarks aside, all indications are that it will perform magnificantly. Taken as a whole, the P4 contains not only the sorts of design changes necessary to *double* clock speed on a given process over the P6 (note:WOW), but also *increase* IPC. But we'll see how this beautiful looking design translates to reality when the first actual P4's are released and benchmarked.
Blah blah blah, biased statements towards Ace's.
Ace's is in general a slightly AMD-biased site. "Unfortunately", Johan, Brian, and the rest of the crew there "have to" read the thoughts of actual MPU experts day in and day out in their technical forum, and thus know that the case for the K7--and against the P4--is not what the average hardware site has made it out to be. This is not to take anything away from AMD, which has at the moment by far and away the fastest performing MPUs on the planet, the best binsplits on the planet, and about 1.4x the performance/price of Intel all the way up and down their price lists. However, all appearances are that, once the P4 moves into heavy volume production (note: not until Q3 next year at the earliest, after a process shrink to .13 Cu), Intel will have a very strong and competitive lineup. And that until then, while AMD ought to be the choice of every sane computer buyer around, Intel will have bragging rights for the highest-performing (not just highest-clocking) chip in the x86 space, if not in the world. Furthermore, with the K8 almost certain to be just a derivative of the K7 (probably with 64-bit extensions and 2-way CMP), it looks as if Intel will take back the clock-speed crown and hold it for good. Whether that means it will win the performance crown for good remains to be seen, but I certainly wouldn't discount the P4 core if I were you. -
Re:This isn't a discussion about design philosophyUgh. Ignorant crap getting a +4 insightful. Well, let's get this over with...
Rather, it is a piece of self-promotion by Ace's Hardware, who sent this story in themselves.
Many websites send notices of their original content to each other, especially when they know that it is excellent content, like this article. ArsTechnica sends notices both to Ace's and to /. Here is an example of exactly the same brand of "self-promotion" from Hannibal, and as regarding a (IMO) far less worthy though still interesting article.
The article itself doesn't say anything the knowledgeable don't already know.
This is false. I am a hell of a lot more knowledgeable in matters of MPU architecture than you, and I learned quite a bit. But I suppose you were already an expert on the intricacies of load-store reordering on the P6 vs. the K7, on the precise weaknesses of the K7's branch prediction algorithm (i.e. that it throws an exception and flushes its BTB when presented with more than two branches in a 16-byte aligned code window), on the dependancy scheduling problems of very large instruction reorder buffers and what they imply about the P4's clock-speed ramp. I suppose you'd already seen benchmarks which measured the effects of L2 latency and branch prediction on IPC. (You wouldn't mind posting a link, would you troll?)
In fact, it reads like a high-school report, and not even a very well-written one. E.g., "First we will try to analyze the most important shortcomings, next we will search for possible solutions." Sounds just like the simplistic expositions of a high school term paper.
Way to go, asshole. The author's name is Johan De Gelas. He lives in the Netherlands. ENGLISH IS NOT HIS NATIVE LANGUAGE. I'd like to see you post a single sentence in Danish, much less an incredibly insightful article on competing philosophies in next-generation 1.5 GHz+ MPU design.
Look, I know that there is a lot of mumbo-jumbo laden "technical" architecture discussion going around the web, often quite nonsensical and written by good-old fashioned Americans who just haven't had the benefit of 8th grade grammar (or a solid education in MPU design). The point is, you were horribly wrong to lump this article in with that schlock, and you apparently did so only because it contained terms and explanations which you didn't understand. Furthermore, you made your point, with quite authoritative tone, in a public forum. Of course you have every right to be loud and wrong in /. Indeed, I've been known to be loud and wrong in /. several times before. Still, if you don't know what you're talking about, please please please don't talk.
I repeat: the article is not a technical piece at all. Hannibal at ArsTechnica writes technical pieces about CPU design. This article at Ace's Hardware says nothing insightful.
Completely backwards. Now, let me first say that I not only respect Hannibal tremendously, but that his articles (particularly the excellent RISC vs. CISC in the Post-RISC era) were what inspired me, a bit over a year ago, to begin to learn much more about MPU architecture and design. They are written very vividly, with strong prose and excellent, clear analogies. They do a fabulous job of explaining complicated concepts and new trends in MPU design to a lay reader.
ArsTechnica, like /., is a general-purpose tech site. Ace's Hardware is all about hardware, mainly MPU design and architecture. Indeed, it is perhaps the most respected daily-updated MPU architecture site on the web. Several experts--many very well informed amateurs, many who work in the industry--post in their technical forum. We're talking people like Aaron Spink, MPU designer for Compaq, who works on what is generally acknowledged to be the best MPU design team on the planet (the Alpha). We're also talking people like Paul DeMone, designer for MOSAID, who in his free time writes IMNSHO the best technical series of design articles available for free, including this excellent article which destroyed one of Hannibal's fundamental premises in that Post-RISC article I loved so much. And indeed, Hannibal immediately posted a link to the article and said as much. That's because, as great a service as he provides--and I really, really love Hannibal's articles and they're the first thing I recommend to anyone interested in learning about MPU design--they are *not* technical, they often miss important points which an experienced professional would not (as in this case), and Hannibal is just a student with the benefit of a few architecture classes and a well-worn copy of Hennessy and Patterson.
So by all means, people--if you're reading this and want to learn about the fascinating world of MPU design, start with Hannibal. But just know that his articles, while very good, are *not* technical; when you want technical, a great place to start is Ace's.
Now that we're through with that bit of unpleasantness, let's clean up your misstatements, shall we?
In fact, it misses the point. It dares to call the P4 "innovative" and wonder whether future designs in the x86 world will copy it. Well, of course not! How many times must it be said that the P4 barely keeps up with the Athlon and performs less well than a P!!!? Because, that is a fact. Numerous production samples have leaked, with the test results uniformly and without exception pointing to the fact that even if the platform's performance is improved by release time--which it should, since these are samples not a retail product--it won't outperform a P!!! with equal clockspeed. That's why the P4 is being released at 1.4 and 1.5GHz initially, because if they were released at 1.2GHz they'd be outperformed by the 1GHz P!!! and that wouldn't be good.
Oh really. Just like preproduction benchmarks of the K7 proved it to be "closer to that of a Celeron 366 than any Pentium III." Just like preproduction benchmarks of the PII lead to the following insightful comments from Tom's Hardware (a leader in the "P4 is overhyped, clock-speed isn't everything, blah blah blah" ignorance these days...):
Well, the beef with the Pentium II is that it seems to suffer from BSE (bovine spongiform encephelephy a.k.a. Mad Cow Disease), although I doubt that any British cattle was involved. Although BSE infected products shouldn't be imported, I'm pretty sure we'll also see the Pentium II here in Europe soon after the 3rd of May when it is finally released. However, since I wouldn't eat BSE infected beef, I wouldn't be interested in risking an infection of my computer with this CPU either.
...For former Pentium users there's hardly any attractiveness in the Pentium II either. The Windows 95 performance is hardly any better and in some cases even worse than the cheaper Pentium Pro or Pentium MMX. Windows NT users would be the last ones to be interested in the Pentium II, there is just no reason at all to swap the Pentium Pro for a Pentium II.
Guess what: preproduction benchmarks are always wrong. Again, preproduction benchmarks are always wrong. And in particular, the benchmarks we've seen on those preproduction P4's are--just like the benchmarks included in the articles above (i.e. the K7 scoring only 60% of a clock-normalized PIII on FPUMark; the PII doing worse on 32-bit code than a P5-MMX)--utter nonsense given what we know about the P4's design . Thus the logical conclusion is that, just like the preproduction MPU's "benchmarked" above (and let me remind you that those were at least close enough to final silicon to be clocked at release-ready clock speeds), the P4's we have seen "benchmarked" on the web so far have been sandbagged.
Now, the common reaction to these charges goes something like this: "Sandbagged? Impossible! After all, these P4's are at most one stepping from final silicon, maybe even final silicon! Thus they can't be sandbagged!" Which is utterly false. Obviously the sandbagging isn't done in the chip design--that would be idiotic. Rather, it is done in microcode. Every feature of the chip can be turned on and off, tuned and detuned, in microcode. Thus it is trivial to ship a preproduction MPU off for validation with, for example, part of the L2 cache disabled, or the BTB or instruction reorder buffers set to flush when they don't need to, or the way prediction on the two-cycle L1 cache turned off, or tuned wrong, or with certain x86 instructions mapped to unnecessarily slow circuit paths, or any of dozens and dozens of different things set wrong. Indeed, this is the common state of internal preproduction MPUs, because the only way to test corner cases and pathological cases is by disabling one part of the chip and thus placing unrealistic stress on another. In other words, preproduction chips are sort of like beta software--full of DEBUG code which slows everything down, but isn't worth taking out until you're sure everything works.
"But," you may say, "why would Intel sandbag their preproduction P4's when they know benchmarks will leak out?? Why not build up the hype and all that??" The answer, again, is simple. If you take a look at Intel's history of dealing with prerelease cores, you find that they only hype the projects which are likely to underperform horribly--the i860, the iAPX432, Itanium--and they significantly underplay the ones which are going to kick major booty--eg. the P6 core and now the P4. "But why???" Easy. If Intel has a project which sucks, the best they can hope for is to scare off their potential competitors from the market space until they can get another crack at it. (Remember, there's a 3-or-more year lag-time between the decision to start--or not start--a project and the finished product.) That's exactly what they've done with Itanium, scaring MIPS out of the high-end RISC business, and putting Compaq and HP years behind on their high-end RISC designs, with nothing but a bunch of IA-64 FUD. Meanwhile, if their upcoming core is going to perform incredibly, why waste time hyping and giving your competitors the tip-off?? All that would do is cannibalize the sales of your current MPUs as people wait to get the amazing new chip due out in 6 months. Worse, if Intel hyped the great performance of the upcoming P4, they would need to admit that the average PC user can actually use 1 GHz+ performance...which, of course, would play right into the hands of AMD which is the only player with decent 1GHz+ volume until well into next year. This way, you get to surprise the industry, get great press, and sell off way more of your old, now obsolete chips. Simple, really.
Now, the P4 barely keeps up with the current-generation Athlon Thunderbirds. This is important to note because people always *blamed* AMD for a processor which still, with the advantages of the P!!! SIMD intruction optimizations used in much software, didn't quite keep pace with Intel's offering in the most common benchmarks. Now, the technically knowledgeable know that the Athlon whomps the P!!! in anything that isn't SIMDified, and that its floating point unit is head-and-shoulders above. But people still moaned about the performance gap in certain common SIMDified benchmarks.
Wrong, wrong, wrong. The only cases in which the Athlon clearly bests a Coppermine P3 is in scientific (i.e. double-precision) FPU-heavy simulations, ray tracing, etc. On almost every other benchmark, they are within +/-5% at identical clock speeds, with a few standouts at around +/-8% for each architecture. In particular, 3D games tend to show an affinity for the Coppermine. Blaming this on some "SIMD bogeyman" is ridiculous--every 3D game, and especially a standout game like Quake 3, is optimized for 3DNow just as it is for SSE. Now, you can either deny the facts, or you can try to understand them.
The main culprit, of course, is the difference in L2 latencies. Tbird has a 64-bit bus to L2 at a latency of 11 clock cycles, with 384Kb total cache; Coppermine has a 256-bit bus to L2 at a latency of 7 clock cycles with 256Kb total cache. The Tbird has the bigger cache because the cache design is exclusive; however, it also has much longer latencies for this and other reasons. In the end, there is no comparison as to which is the better design--the Coppermine's cache hierarchy is simply better than the TBird's, no argument about it. And Johan's benchmarks illustrate this rather nicely.
Well, here's what they didn't realize: the Athlon is a truly seventh-generation core--which beat Intel to the punch by, what, almost a year and a half? As such, it has made trade-offs to be able to scale to higher clockspeeds better--one reason why Intel had to recall, and still hasn't re-issued, the 1.13GHz P!!! yet AMD are easily churning out 1.2GHz Athlon Thunderbirds.
"The Athlon is truly a seventh-generation core." What does that mean??? If you think it means the K7 core has one single architectural innovation which does not exist on an MPU available before it, then I challenge you to list it now. (Indeed, I can't think of a single innovation in the K7 which isn't in the P6 core--except for the exclusive cache architecture, which is an overall weakness compared to the Coppermine cache--but there may be some.) If you think it means the K7 is a better core than the P6, well, you're right. The K7 is indeed a better core, in that its pipeline stages are more evenly balanced, and thus it can scale to higher clockspeeds on similar process. On the other hand, the K7 is less well balanced from an execution resources standpoint, including such oafish features as a fully 3-wide FPU (as opposed to the P6's 1.5-wide FPU), which offers at best 40% better performance, but generally no better performance than the P6 on FP intensive apps. Yes, the reason for the discrepancy is partly due to code which is compiled with the P6's execution resources in mind--but of course, that will continue to be most things so long as Intel has the majority of market share (AMD currently sells out all the MPUs it can make and thus has no theoretical way of getting majority market share for at least the next 4 years or so), and most apps are precompiled binary. But it's partly due to the fact that there's just not enough need for 3 full FPUs to justify the die space they take. This is just one example, but the end result is that the K7 is a well-balanced core pipeline-wise which is larger and consumes more power than it can justify based on its ability to get instructions from cache and memory. It is still the fastest thing out there, but it uses brute force to make it there. Time-to-market issues are behind some of these design issues, and some of those will be solved with the upcoming Mustang/Palomino/Morgan core tweak. But that still won't make the K7 anything more than a rebalanced tweaked-out brute-force of a P6. And hey--that ain't bad. But it ain't innovation.
The P4, on the other hand, includes many features never before seen on a commercial MPU. They include: double-pumped ALU, integer decoder and scheduler, and integer retiring (running at up to 4 GHz on a .18 process!!!); trace cache; two-cycle L1 potentially using way-prediction to reach 2.0 GHz on a .18 process; hardware prefetch; and, well, a pipeline deep enough to allow 2.0 GHz on a .18 process. It also includes some impressive resources never before seen on the x86 side of things. They include: 126 op buffer; 3.2 GB/s-4.27 Gb/s FSB; "most accurate branch prediction algorithm ever" (claimed by Intel at MPF a couple weeks ago); 48 GB/s L2->core bandwidth; and SSE2, which will finally let the x86 push double-precision FP code with the big boys, and doesn't resort to a kludgy, die-space-wasting, gas-guzzling halfway-solution like the K7's triple FPU. On the downside there is the branch misprediction penalty of 19 clocks, potentially 27 if the code is not in the trace cache (unlikely). However, even this is mitigated by the fact that while the official branch mispredict penalty of the P6, for example, is a mere 12 clocks IIRC, the actual time to execute new code on a mispredict is more in the neighborhood of 30-50 clocks, because the instructions need to be rescheduled. Meanwhile, the P4 has wider scheduling resources, and thus may not even have a higher branch mispredict penalty in practice at all. It will certainly have many fewer mispredicts, so the overall analysis here is probably a wash.
It is, all-in-all, a very impressive looking chip, more than worthy of the title "seventh generation", whether it turns out to perform well or poorly. However, meaningless sandbagged benchmarks aside, all indications are that it will perform magnificantly. Taken as a whole, the P4 contains not only the sorts of design changes necessary to *double* clock speed on a given process over the P6 (note:WOW), but also *increase* IPC. But we'll see how this beautiful looking design translates to reality when the first actual P4's are released and benchmarked.
Blah blah blah, biased statements towards Ace's.
Ace's is in general a slightly AMD-biased site. "Unfortunately", Johan, Brian, and the rest of the crew there "have to" read the thoughts of actual MPU experts day in and day out in their technical forum, and thus know that the case for the K7--and against the P4--is not what the average hardware site has made it out to be. This is not to take anything away from AMD, which has at the moment by far and away the fastest performing MPUs on the planet, the best binsplits on the planet, and about 1.4x the performance/price of Intel all the way up and down their price lists. However, all appearances are that, once the P4 moves into heavy volume production (note: not until Q3 next year at the earliest, after a process shrink to .13 Cu), Intel will have a very strong and competitive lineup. And that until then, while AMD ought to be the choice of every sane computer buyer around, Intel will have bragging rights for the highest-performing (not just highest-clocking) chip in the x86 space, if not in the world. Furthermore, with the K8 almost certain to be just a derivative of the K7 (probably with 64-bit extensions and 2-way CMP), it looks as if Intel will take back the clock-speed crown and hold it for good. Whether that means it will win the performance crown for good remains to be seen, but I certainly wouldn't discount the P4 core if I were you. -
Re:This isn't a discussion about design philosophyUgh. Ignorant crap getting a +4 insightful. Well, let's get this over with...
Rather, it is a piece of self-promotion by Ace's Hardware, who sent this story in themselves.
Many websites send notices of their original content to each other, especially when they know that it is excellent content, like this article. ArsTechnica sends notices both to Ace's and to /. Here is an example of exactly the same brand of "self-promotion" from Hannibal, and as regarding a (IMO) far less worthy though still interesting article.
The article itself doesn't say anything the knowledgeable don't already know.
This is false. I am a hell of a lot more knowledgeable in matters of MPU architecture than you, and I learned quite a bit. But I suppose you were already an expert on the intricacies of load-store reordering on the P6 vs. the K7, on the precise weaknesses of the K7's branch prediction algorithm (i.e. that it throws an exception and flushes its BTB when presented with more than two branches in a 16-byte aligned code window), on the dependancy scheduling problems of very large instruction reorder buffers and what they imply about the P4's clock-speed ramp. I suppose you'd already seen benchmarks which measured the effects of L2 latency and branch prediction on IPC. (You wouldn't mind posting a link, would you troll?)
In fact, it reads like a high-school report, and not even a very well-written one. E.g., "First we will try to analyze the most important shortcomings, next we will search for possible solutions." Sounds just like the simplistic expositions of a high school term paper.
Way to go, asshole. The author's name is Johan De Gelas. He lives in the Netherlands. ENGLISH IS NOT HIS NATIVE LANGUAGE. I'd like to see you post a single sentence in Danish, much less an incredibly insightful article on competing philosophies in next-generation 1.5 GHz+ MPU design.
Look, I know that there is a lot of mumbo-jumbo laden "technical" architecture discussion going around the web, often quite nonsensical and written by good-old fashioned Americans who just haven't had the benefit of 8th grade grammar (or a solid education in MPU design). The point is, you were horribly wrong to lump this article in with that schlock, and you apparently did so only because it contained terms and explanations which you didn't understand. Furthermore, you made your point, with quite authoritative tone, in a public forum. Of course you have every right to be loud and wrong in /. Indeed, I've been known to be loud and wrong in /. several times before. Still, if you don't know what you're talking about, please please please don't talk.
I repeat: the article is not a technical piece at all. Hannibal at ArsTechnica writes technical pieces about CPU design. This article at Ace's Hardware says nothing insightful.
Completely backwards. Now, let me first say that I not only respect Hannibal tremendously, but that his articles (particularly the excellent RISC vs. CISC in the Post-RISC era) were what inspired me, a bit over a year ago, to begin to learn much more about MPU architecture and design. They are written very vividly, with strong prose and excellent, clear analogies. They do a fabulous job of explaining complicated concepts and new trends in MPU design to a lay reader.
ArsTechnica, like /., is a general-purpose tech site. Ace's Hardware is all about hardware, mainly MPU design and architecture. Indeed, it is perhaps the most respected daily-updated MPU architecture site on the web. Several experts--many very well informed amateurs, many who work in the industry--post in their technical forum. We're talking people like Aaron Spink, MPU designer for Compaq, who works on what is generally acknowledged to be the best MPU design team on the planet (the Alpha). We're also talking people like Paul DeMone, designer for MOSAID, who in his free time writes IMNSHO the best technical series of design articles available for free, including this excellent article which destroyed one of Hannibal's fundamental premises in that Post-RISC article I loved so much. And indeed, Hannibal immediately posted a link to the article and said as much. That's because, as great a service as he provides--and I really, really love Hannibal's articles and they're the first thing I recommend to anyone interested in learning about MPU design--they are *not* technical, they often miss important points which an experienced professional would not (as in this case), and Hannibal is just a student with the benefit of a few architecture classes and a well-worn copy of Hennessy and Patterson.
So by all means, people--if you're reading this and want to learn about the fascinating world of MPU design, start with Hannibal. But just know that his articles, while very good, are *not* technical; when you want technical, a great place to start is Ace's.
Now that we're through with that bit of unpleasantness, let's clean up your misstatements, shall we?
In fact, it misses the point. It dares to call the P4 "innovative" and wonder whether future designs in the x86 world will copy it. Well, of course not! How many times must it be said that the P4 barely keeps up with the Athlon and performs less well than a P!!!? Because, that is a fact. Numerous production samples have leaked, with the test results uniformly and without exception pointing to the fact that even if the platform's performance is improved by release time--which it should, since these are samples not a retail product--it won't outperform a P!!! with equal clockspeed. That's why the P4 is being released at 1.4 and 1.5GHz initially, because if they were released at 1.2GHz they'd be outperformed by the 1GHz P!!! and that wouldn't be good.
Oh really. Just like preproduction benchmarks of the K7 proved it to be "closer to that of a Celeron 366 than any Pentium III." Just like preproduction benchmarks of the PII lead to the following insightful comments from Tom's Hardware (a leader in the "P4 is overhyped, clock-speed isn't everything, blah blah blah" ignorance these days...):
Well, the beef with the Pentium II is that it seems to suffer from BSE (bovine spongiform encephelephy a.k.a. Mad Cow Disease), although I doubt that any British cattle was involved. Although BSE infected products shouldn't be imported, I'm pretty sure we'll also see the Pentium II here in Europe soon after the 3rd of May when it is finally released. However, since I wouldn't eat BSE infected beef, I wouldn't be interested in risking an infection of my computer with this CPU either.
...For former Pentium users there's hardly any attractiveness in the Pentium II either. The Windows 95 performance is hardly any better and in some cases even worse than the cheaper Pentium Pro or Pentium MMX. Windows NT users would be the last ones to be interested in the Pentium II, there is just no reason at all to swap the Pentium Pro for a Pentium II.
Guess what: preproduction benchmarks are always wrong. Again, preproduction benchmarks are always wrong. And in particular, the benchmarks we've seen on those preproduction P4's are--just like the benchmarks included in the articles above (i.e. the K7 scoring only 60% of a clock-normalized PIII on FPUMark; the PII doing worse on 32-bit code than a P5-MMX)--utter nonsense given what we know about the P4's design . Thus the logical conclusion is that, just like the preproduction MPU's "benchmarked" above (and let me remind you that those were at least close enough to final silicon to be clocked at release-ready clock speeds), the P4's we have seen "benchmarked" on the web so far have been sandbagged.
Now, the common reaction to these charges goes something like this: "Sandbagged? Impossible! After all, these P4's are at most one stepping from final silicon, maybe even final silicon! Thus they can't be sandbagged!" Which is utterly false. Obviously the sandbagging isn't done in the chip design--that would be idiotic. Rather, it is done in microcode. Every feature of the chip can be turned on and off, tuned and detuned, in microcode. Thus it is trivial to ship a preproduction MPU off for validation with, for example, part of the L2 cache disabled, or the BTB or instruction reorder buffers set to flush when they don't need to, or the way prediction on the two-cycle L1 cache turned off, or tuned wrong, or with certain x86 instructions mapped to unnecessarily slow circuit paths, or any of dozens and dozens of different things set wrong. Indeed, this is the common state of internal preproduction MPUs, because the only way to test corner cases and pathological cases is by disabling one part of the chip and thus placing unrealistic stress on another. In other words, preproduction chips are sort of like beta software--full of DEBUG code which slows everything down, but isn't worth taking out until you're sure everything works.
"But," you may say, "why would Intel sandbag their preproduction P4's when they know benchmarks will leak out?? Why not build up the hype and all that??" The answer, again, is simple. If you take a look at Intel's history of dealing with prerelease cores, you find that they only hype the projects which are likely to underperform horribly--the i860, the iAPX432, Itanium--and they significantly underplay the ones which are going to kick major booty--eg. the P6 core and now the P4. "But why???" Easy. If Intel has a project which sucks, the best they can hope for is to scare off their potential competitors from the market space until they can get another crack at it. (Remember, there's a 3-or-more year lag-time between the decision to start--or not start--a project and the finished product.) That's exactly what they've done with Itanium, scaring MIPS out of the high-end RISC business, and putting Compaq and HP years behind on their high-end RISC designs, with nothing but a bunch of IA-64 FUD. Meanwhile, if their upcoming core is going to perform incredibly, why waste time hyping and giving your competitors the tip-off?? All that would do is cannibalize the sales of your current MPUs as people wait to get the amazing new chip due out in 6 months. Worse, if Intel hyped the great performance of the upcoming P4, they would need to admit that the average PC user can actually use 1 GHz+ performance...which, of course, would play right into the hands of AMD which is the only player with decent 1GHz+ volume until well into next year. This way, you get to surprise the industry, get great press, and sell off way more of your old, now obsolete chips. Simple, really.
Now, the P4 barely keeps up with the current-generation Athlon Thunderbirds. This is important to note because people always *blamed* AMD for a processor which still, with the advantages of the P!!! SIMD intruction optimizations used in much software, didn't quite keep pace with Intel's offering in the most common benchmarks. Now, the technically knowledgeable know that the Athlon whomps the P!!! in anything that isn't SIMDified, and that its floating point unit is head-and-shoulders above. But people still moaned about the performance gap in certain common SIMDified benchmarks.
Wrong, wrong, wrong. The only cases in which the Athlon clearly bests a Coppermine P3 is in scientific (i.e. double-precision) FPU-heavy simulations, ray tracing, etc. On almost every other benchmark, they are within +/-5% at identical clock speeds, with a few standouts at around +/-8% for each architecture. In particular, 3D games tend to show an affinity for the Coppermine. Blaming this on some "SIMD bogeyman" is ridiculous--every 3D game, and especially a standout game like Quake 3, is optimized for 3DNow just as it is for SSE. Now, you can either deny the facts, or you can try to understand them.
The main culprit, of course, is the difference in L2 latencies. Tbird has a 64-bit bus to L2 at a latency of 11 clock cycles, with 384Kb total cache; Coppermine has a 256-bit bus to L2 at a latency of 7 clock cycles with 256Kb total cache. The Tbird has the bigger cache because the cache design is exclusive; however, it also has much longer latencies for this and other reasons. In the end, there is no comparison as to which is the better design--the Coppermine's cache hierarchy is simply better than the TBird's, no argument about it. And Johan's benchmarks illustrate this rather nicely.
Well, here's what they didn't realize: the Athlon is a truly seventh-generation core--which beat Intel to the punch by, what, almost a year and a half? As such, it has made trade-offs to be able to scale to higher clockspeeds better--one reason why Intel had to recall, and still hasn't re-issued, the 1.13GHz P!!! yet AMD are easily churning out 1.2GHz Athlon Thunderbirds.
"The Athlon is truly a seventh-generation core." What does that mean??? If you think it means the K7 core has one single architectural innovation which does not exist on an MPU available before it, then I challenge you to list it now. (Indeed, I can't think of a single innovation in the K7 which isn't in the P6 core--except for the exclusive cache architecture, which is an overall weakness compared to the Coppermine cache--but there may be some.) If you think it means the K7 is a better core than the P6, well, you're right. The K7 is indeed a better core, in that its pipeline stages are more evenly balanced, and thus it can scale to higher clockspeeds on similar process. On the other hand, the K7 is less well balanced from an execution resources standpoint, including such oafish features as a fully 3-wide FPU (as opposed to the P6's 1.5-wide FPU), which offers at best 40% better performance, but generally no better performance than the P6 on FP intensive apps. Yes, the reason for the discrepancy is partly due to code which is compiled with the P6's execution resources in mind--but of course, that will continue to be most things so long as Intel has the majority of market share (AMD currently sells out all the MPUs it can make and thus has no theoretical way of getting majority market share for at least the next 4 years or so), and most apps are precompiled binary. But it's partly due to the fact that there's just not enough need for 3 full FPUs to justify the die space they take. This is just one example, but the end result is that the K7 is a well-balanced core pipeline-wise which is larger and consumes more power than it can justify based on its ability to get instructions from cache and memory. It is still the fastest thing out there, but it uses brute force to make it there. Time-to-market issues are behind some of these design issues, and some of those will be solved with the upcoming Mustang/Palomino/Morgan core tweak. But that still won't make the K7 anything more than a rebalanced tweaked-out brute-force of a P6. And hey--that ain't bad. But it ain't innovation.
The P4, on the other hand, includes many features never before seen on a commercial MPU. They include: double-pumped ALU, integer decoder and scheduler, and integer retiring (running at up to 4 GHz on a .18 process!!!); trace cache; two-cycle L1 potentially using way-prediction to reach 2.0 GHz on a .18 process; hardware prefetch; and, well, a pipeline deep enough to allow 2.0 GHz on a .18 process. It also includes some impressive resources never before seen on the x86 side of things. They include: 126 op buffer; 3.2 GB/s-4.27 Gb/s FSB; "most accurate branch prediction algorithm ever" (claimed by Intel at MPF a couple weeks ago); 48 GB/s L2->core bandwidth; and SSE2, which will finally let the x86 push double-precision FP code with the big boys, and doesn't resort to a kludgy, die-space-wasting, gas-guzzling halfway-solution like the K7's triple FPU. On the downside there is the branch misprediction penalty of 19 clocks, potentially 27 if the code is not in the trace cache (unlikely). However, even this is mitigated by the fact that while the official branch mispredict penalty of the P6, for example, is a mere 12 clocks IIRC, the actual time to execute new code on a mispredict is more in the neighborhood of 30-50 clocks, because the instructions need to be rescheduled. Meanwhile, the P4 has wider scheduling resources, and thus may not even have a higher branch mispredict penalty in practice at all. It will certainly have many fewer mispredicts, so the overall analysis here is probably a wash.
It is, all-in-all, a very impressive looking chip, more than worthy of the title "seventh generation", whether it turns out to perform well or poorly. However, meaningless sandbagged benchmarks aside, all indications are that it will perform magnificantly. Taken as a whole, the P4 contains not only the sorts of design changes necessary to *double* clock speed on a given process over the P6 (note:WOW), but also *increase* IPC. But we'll see how this beautiful looking design translates to reality when the first actual P4's are released and benchmarked.
Blah blah blah, biased statements towards Ace's.
Ace's is in general a slightly AMD-biased site. "Unfortunately", Johan, Brian, and the rest of the crew there "have to" read the thoughts of actual MPU experts day in and day out in their technical forum, and thus know that the case for the K7--and against the P4--is not what the average hardware site has made it out to be. This is not to take anything away from AMD, which has at the moment by far and away the fastest performing MPUs on the planet, the best binsplits on the planet, and about 1.4x the performance/price of Intel all the way up and down their price lists. However, all appearances are that, once the P4 moves into heavy volume production (note: not until Q3 next year at the earliest, after a process shrink to .13 Cu), Intel will have a very strong and competitive lineup. And that until then, while AMD ought to be the choice of every sane computer buyer around, Intel will have bragging rights for the highest-performing (not just highest-clocking) chip in the x86 space, if not in the world. Furthermore, with the K8 almost certain to be just a derivative of the K7 (probably with 64-bit extensions and 2-way CMP), it looks as if Intel will take back the clock-speed crown and hold it for good. Whether that means it will win the performance crown for good remains to be seen, but I certainly wouldn't discount the P4 core if I were you. -
Re:This isn't a discussion about design philosophyUgh. Ignorant crap getting a +4 insightful. Well, let's get this over with...
Rather, it is a piece of self-promotion by Ace's Hardware, who sent this story in themselves.
Many websites send notices of their original content to each other, especially when they know that it is excellent content, like this article. ArsTechnica sends notices both to Ace's and to /. Here is an example of exactly the same brand of "self-promotion" from Hannibal, and as regarding a (IMO) far less worthy though still interesting article.
The article itself doesn't say anything the knowledgeable don't already know.
This is false. I am a hell of a lot more knowledgeable in matters of MPU architecture than you, and I learned quite a bit. But I suppose you were already an expert on the intricacies of load-store reordering on the P6 vs. the K7, on the precise weaknesses of the K7's branch prediction algorithm (i.e. that it throws an exception and flushes its BTB when presented with more than two branches in a 16-byte aligned code window), on the dependancy scheduling problems of very large instruction reorder buffers and what they imply about the P4's clock-speed ramp. I suppose you'd already seen benchmarks which measured the effects of L2 latency and branch prediction on IPC. (You wouldn't mind posting a link, would you troll?)
In fact, it reads like a high-school report, and not even a very well-written one. E.g., "First we will try to analyze the most important shortcomings, next we will search for possible solutions." Sounds just like the simplistic expositions of a high school term paper.
Way to go, asshole. The author's name is Johan De Gelas. He lives in the Netherlands. ENGLISH IS NOT HIS NATIVE LANGUAGE. I'd like to see you post a single sentence in Danish, much less an incredibly insightful article on competing philosophies in next-generation 1.5 GHz+ MPU design.
Look, I know that there is a lot of mumbo-jumbo laden "technical" architecture discussion going around the web, often quite nonsensical and written by good-old fashioned Americans who just haven't had the benefit of 8th grade grammar (or a solid education in MPU design). The point is, you were horribly wrong to lump this article in with that schlock, and you apparently did so only because it contained terms and explanations which you didn't understand. Furthermore, you made your point, with quite authoritative tone, in a public forum. Of course you have every right to be loud and wrong in /. Indeed, I've been known to be loud and wrong in /. several times before. Still, if you don't know what you're talking about, please please please don't talk.
I repeat: the article is not a technical piece at all. Hannibal at ArsTechnica writes technical pieces about CPU design. This article at Ace's Hardware says nothing insightful.
Completely backwards. Now, let me first say that I not only respect Hannibal tremendously, but that his articles (particularly the excellent RISC vs. CISC in the Post-RISC era) were what inspired me, a bit over a year ago, to begin to learn much more about MPU architecture and design. They are written very vividly, with strong prose and excellent, clear analogies. They do a fabulous job of explaining complicated concepts and new trends in MPU design to a lay reader.
ArsTechnica, like /., is a general-purpose tech site. Ace's Hardware is all about hardware, mainly MPU design and architecture. Indeed, it is perhaps the most respected daily-updated MPU architecture site on the web. Several experts--many very well informed amateurs, many who work in the industry--post in their technical forum. We're talking people like Aaron Spink, MPU designer for Compaq, who works on what is generally acknowledged to be the best MPU design team on the planet (the Alpha). We're also talking people like Paul DeMone, designer for MOSAID, who in his free time writes IMNSHO the best technical series of design articles available for free, including this excellent article which destroyed one of Hannibal's fundamental premises in that Post-RISC article I loved so much. And indeed, Hannibal immediately posted a link to the article and said as much. That's because, as great a service as he provides--and I really, really love Hannibal's articles and they're the first thing I recommend to anyone interested in learning about MPU design--they are *not* technical, they often miss important points which an experienced professional would not (as in this case), and Hannibal is just a student with the benefit of a few architecture classes and a well-worn copy of Hennessy and Patterson.
So by all means, people--if you're reading this and want to learn about the fascinating world of MPU design, start with Hannibal. But just know that his articles, while very good, are *not* technical; when you want technical, a great place to start is Ace's.
Now that we're through with that bit of unpleasantness, let's clean up your misstatements, shall we?
In fact, it misses the point. It dares to call the P4 "innovative" and wonder whether future designs in the x86 world will copy it. Well, of course not! How many times must it be said that the P4 barely keeps up with the Athlon and performs less well than a P!!!? Because, that is a fact. Numerous production samples have leaked, with the test results uniformly and without exception pointing to the fact that even if the platform's performance is improved by release time--which it should, since these are samples not a retail product--it won't outperform a P!!! with equal clockspeed. That's why the P4 is being released at 1.4 and 1.5GHz initially, because if they were released at 1.2GHz they'd be outperformed by the 1GHz P!!! and that wouldn't be good.
Oh really. Just like preproduction benchmarks of the K7 proved it to be "closer to that of a Celeron 366 than any Pentium III." Just like preproduction benchmarks of the PII lead to the following insightful comments from Tom's Hardware (a leader in the "P4 is overhyped, clock-speed isn't everything, blah blah blah" ignorance these days...):
Well, the beef with the Pentium II is that it seems to suffer from BSE (bovine spongiform encephelephy a.k.a. Mad Cow Disease), although I doubt that any British cattle was involved. Although BSE infected products shouldn't be imported, I'm pretty sure we'll also see the Pentium II here in Europe soon after the 3rd of May when it is finally released. However, since I wouldn't eat BSE infected beef, I wouldn't be interested in risking an infection of my computer with this CPU either.
...For former Pentium users there's hardly any attractiveness in the Pentium II either. The Windows 95 performance is hardly any better and in some cases even worse than the cheaper Pentium Pro or Pentium MMX. Windows NT users would be the last ones to be interested in the Pentium II, there is just no reason at all to swap the Pentium Pro for a Pentium II.
Guess what: preproduction benchmarks are always wrong. Again, preproduction benchmarks are always wrong. And in particular, the benchmarks we've seen on those preproduction P4's are--just like the benchmarks included in the articles above (i.e. the K7 scoring only 60% of a clock-normalized PIII on FPUMark; the PII doing worse on 32-bit code than a P5-MMX)--utter nonsense given what we know about the P4's design . Thus the logical conclusion is that, just like the preproduction MPU's "benchmarked" above (and let me remind you that those were at least close enough to final silicon to be clocked at release-ready clock speeds), the P4's we have seen "benchmarked" on the web so far have been sandbagged.
Now, the common reaction to these charges goes something like this: "Sandbagged? Impossible! After all, these P4's are at most one stepping from final silicon, maybe even final silicon! Thus they can't be sandbagged!" Which is utterly false. Obviously the sandbagging isn't done in the chip design--that would be idiotic. Rather, it is done in microcode. Every feature of the chip can be turned on and off, tuned and detuned, in microcode. Thus it is trivial to ship a preproduction MPU off for validation with, for example, part of the L2 cache disabled, or the BTB or instruction reorder buffers set to flush when they don't need to, or the way prediction on the two-cycle L1 cache turned off, or tuned wrong, or with certain x86 instructions mapped to unnecessarily slow circuit paths, or any of dozens and dozens of different things set wrong. Indeed, this is the common state of internal preproduction MPUs, because the only way to test corner cases and pathological cases is by disabling one part of the chip and thus placing unrealistic stress on another. In other words, preproduction chips are sort of like beta software--full of DEBUG code which slows everything down, but isn't worth taking out until you're sure everything works.
"But," you may say, "why would Intel sandbag their preproduction P4's when they know benchmarks will leak out?? Why not build up the hype and all that??" The answer, again, is simple. If you take a look at Intel's history of dealing with prerelease cores, you find that they only hype the projects which are likely to underperform horribly--the i860, the iAPX432, Itanium--and they significantly underplay the ones which are going to kick major booty--eg. the P6 core and now the P4. "But why???" Easy. If Intel has a project which sucks, the best they can hope for is to scare off their potential competitors from the market space until they can get another crack at it. (Remember, there's a 3-or-more year lag-time between the decision to start--or not start--a project and the finished product.) That's exactly what they've done with Itanium, scaring MIPS out of the high-end RISC business, and putting Compaq and HP years behind on their high-end RISC designs, with nothing but a bunch of IA-64 FUD. Meanwhile, if their upcoming core is going to perform incredibly, why waste time hyping and giving your competitors the tip-off?? All that would do is cannibalize the sales of your current MPUs as people wait to get the amazing new chip due out in 6 months. Worse, if Intel hyped the great performance of the upcoming P4, they would need to admit that the average PC user can actually use 1 GHz+ performance...which, of course, would play right into the hands of AMD which is the only player with decent 1GHz+ volume until well into next year. This way, you get to surprise the industry, get great press, and sell off way more of your old, now obsolete chips. Simple, really.
Now, the P4 barely keeps up with the current-generation Athlon Thunderbirds. This is important to note because people always *blamed* AMD for a processor which still, with the advantages of the P!!! SIMD intruction optimizations used in much software, didn't quite keep pace with Intel's offering in the most common benchmarks. Now, the technically knowledgeable know that the Athlon whomps the P!!! in anything that isn't SIMDified, and that its floating point unit is head-and-shoulders above. But people still moaned about the performance gap in certain common SIMDified benchmarks.
Wrong, wrong, wrong. The only cases in which the Athlon clearly bests a Coppermine P3 is in scientific (i.e. double-precision) FPU-heavy simulations, ray tracing, etc. On almost every other benchmark, they are within +/-5% at identical clock speeds, with a few standouts at around +/-8% for each architecture. In particular, 3D games tend to show an affinity for the Coppermine. Blaming this on some "SIMD bogeyman" is ridiculous--every 3D game, and especially a standout game like Quake 3, is optimized for 3DNow just as it is for SSE. Now, you can either deny the facts, or you can try to understand them.
The main culprit, of course, is the difference in L2 latencies. Tbird has a 64-bit bus to L2 at a latency of 11 clock cycles, with 384Kb total cache; Coppermine has a 256-bit bus to L2 at a latency of 7 clock cycles with 256Kb total cache. The Tbird has the bigger cache because the cache design is exclusive; however, it also has much longer latencies for this and other reasons. In the end, there is no comparison as to which is the better design--the Coppermine's cache hierarchy is simply better than the TBird's, no argument about it. And Johan's benchmarks illustrate this rather nicely.
Well, here's what they didn't realize: the Athlon is a truly seventh-generation core--which beat Intel to the punch by, what, almost a year and a half? As such, it has made trade-offs to be able to scale to higher clockspeeds better--one reason why Intel had to recall, and still hasn't re-issued, the 1.13GHz P!!! yet AMD are easily churning out 1.2GHz Athlon Thunderbirds.
"The Athlon is truly a seventh-generation core." What does that mean??? If you think it means the K7 core has one single architectural innovation which does not exist on an MPU available before it, then I challenge you to list it now. (Indeed, I can't think of a single innovation in the K7 which isn't in the P6 core--except for the exclusive cache architecture, which is an overall weakness compared to the Coppermine cache--but there may be some.) If you think it means the K7 is a better core than the P6, well, you're right. The K7 is indeed a better core, in that its pipeline stages are more evenly balanced, and thus it can scale to higher clockspeeds on similar process. On the other hand, the K7 is less well balanced from an execution resources standpoint, including such oafish features as a fully 3-wide FPU (as opposed to the P6's 1.5-wide FPU), which offers at best 40% better performance, but generally no better performance than the P6 on FP intensive apps. Yes, the reason for the discrepancy is partly due to code which is compiled with the P6's execution resources in mind--but of course, that will continue to be most things so long as Intel has the majority of market share (AMD currently sells out all the MPUs it can make and thus has no theoretical way of getting majority market share for at least the next 4 years or so), and most apps are precompiled binary. But it's partly due to the fact that there's just not enough need for 3 full FPUs to justify the die space they take. This is just one example, but the end result is that the K7 is a well-balanced core pipeline-wise which is larger and consumes more power than it can justify based on its ability to get instructions from cache and memory. It is still the fastest thing out there, but it uses brute force to make it there. Time-to-market issues are behind some of these design issues, and some of those will be solved with the upcoming Mustang/Palomino/Morgan core tweak. But that still won't make the K7 anything more than a rebalanced tweaked-out brute-force of a P6. And hey--that ain't bad. But it ain't innovation.
The P4, on the other hand, includes many features never before seen on a commercial MPU. They include: double-pumped ALU, integer decoder and scheduler, and integer retiring (running at up to 4 GHz on a .18 process!!!); trace cache; two-cycle L1 potentially using way-prediction to reach 2.0 GHz on a .18 process; hardware prefetch; and, well, a pipeline deep enough to allow 2.0 GHz on a .18 process. It also includes some impressive resources never before seen on the x86 side of things. They include: 126 op buffer; 3.2 GB/s-4.27 Gb/s FSB; "most accurate branch prediction algorithm ever" (claimed by Intel at MPF a couple weeks ago); 48 GB/s L2->core bandwidth; and SSE2, which will finally let the x86 push double-precision FP code with the big boys, and doesn't resort to a kludgy, die-space-wasting, gas-guzzling halfway-solution like the K7's triple FPU. On the downside there is the branch misprediction penalty of 19 clocks, potentially 27 if the code is not in the trace cache (unlikely). However, even this is mitigated by the fact that while the official branch mispredict penalty of the P6, for example, is a mere 12 clocks IIRC, the actual time to execute new code on a mispredict is more in the neighborhood of 30-50 clocks, because the instructions need to be rescheduled. Meanwhile, the P4 has wider scheduling resources, and thus may not even have a higher branch mispredict penalty in practice at all. It will certainly have many fewer mispredicts, so the overall analysis here is probably a wash.
It is, all-in-all, a very impressive looking chip, more than worthy of the title "seventh generation", whether it turns out to perform well or poorly. However, meaningless sandbagged benchmarks aside, all indications are that it will perform magnificantly. Taken as a whole, the P4 contains not only the sorts of design changes necessary to *double* clock speed on a given process over the P6 (note:WOW), but also *increase* IPC. But we'll see how this beautiful looking design translates to reality when the first actual P4's are released and benchmarked.
Blah blah blah, biased statements towards Ace's.
Ace's is in general a slightly AMD-biased site. "Unfortunately", Johan, Brian, and the rest of the crew there "have to" read the thoughts of actual MPU experts day in and day out in their technical forum, and thus know that the case for the K7--and against the P4--is not what the average hardware site has made it out to be. This is not to take anything away from AMD, which has at the moment by far and away the fastest performing MPUs on the planet, the best binsplits on the planet, and about 1.4x the performance/price of Intel all the way up and down their price lists. However, all appearances are that, once the P4 moves into heavy volume production (note: not until Q3 next year at the earliest, after a process shrink to .13 Cu), Intel will have a very strong and competitive lineup. And that until then, while AMD ought to be the choice of every sane computer buyer around, Intel will have bragging rights for the highest-performing (not just highest-clocking) chip in the x86 space, if not in the world. Furthermore, with the K8 almost certain to be just a derivative of the K7 (probably with 64-bit extensions and 2-way CMP), it looks as if Intel will take back the clock-speed crown and hold it for good. Whether that means it will win the performance crown for good remains to be seen, but I certainly wouldn't discount the P4 core if I were you. -
Re:Intel only CPUs
You also play down the FP performance, where the Alpha as is still kills the projected values for the (non-available) 1.5GHz P4. Intel's vaporware isn't even in the same league.
False.
Someone reported a slew of leaked benchmarks on a preproduction 1.4 GHz P4. While many of them were disappointing (more on this later), the SPEC scores reported were rather impressive. In particular, the SPECfp2000 score was 517 IIRC. That would put a 1.5 GHz P4 somewhere in the neighborhood of 550--not quite the 599 that Alpha can pull off, but certainly "in the same league", and certainly not "killed".
But there's more. The actual SPECfp score of the P4 will be much higher, for two reasons:
1) That preproduction 1.4 GHz P4 was almost certainly crippled in some way--i.e. one section of cache may be set to bypass, or several associativity ways turned off, or branch prediction could be turned off, or instruction paths could be routed very conservatively in microcode, or any number of dozens of things. This is almost always true of prerelease benchmarks of a new core; it happened before the Athlon release last year, and before the PPro release so long ago. Furthermore, there is good reason for prerelease chips to be crippled in this way--this is how engineers test finished cores to make sure they're ready for release. Essentially it's a lot like how coders test their code--turn something off and make sure everything works in the degenerate case; that way you not only know it will work in real operation, but if there is a bug you're much closer to isolating it. These settings are set in microcode and not changable by whichever two-bit employee at an OEM decides to leak benchmarks off a preproduction CPU sent there for validation (or whoever leaked the scores). Indeed, preventing accurate leaks is a reason to leave prerelease chips partially disabled like this.
In particular, the resulting slew of benchmarks which came from this particular preproduction P4 showed several very odd results almost certain not to represent the true performance of the fully functioning chip. It's not that the chip did poorly across the board, but rather that a couple particular benchmarks which stress particular aspects of the chip happened to be stupendously bad, and other benchmarks were mainly impressive. Paul DeMone, an EE who writes some excellent technical articles at realworldtech.com and posts often in places like Ace's Hardware, says that in his opinion these particular scores come from a P4 which has had its L1 associativity-way-prediction turned off, thus effectively increasing the latency of the L1 from 2 to 5 or so cycles. (Of course, there's no proof that the P4 even does way-prediction; the idea that it does is simply Paul's conjecture to explain how Intel got an L1 to run with 2-cycle latency at 1.5 GHz (and greater) on a .18um process. You can read his article on this here.) I'm not knowledgeable enough to comment, but Paul knows what he's talking about.
2) As I said in my original post (your accusation of "playing down fp performance" notwithstanding), the P4's SSE2 instruction set includes double-precision fp instructions for the first time. Thus we have an x86 chip which can finally run the double-precision operations necessary for SPECfp without relying on the horribly antiquated 8 register stack-based x87 fp implementation. While the 2-cycle latency L1 data cache will help the P4 run x87 code much faster, the reason all the major RISC chips have always creamed x86 chips in SPECfp is because they have sensible floating point implementations while the x86 chips are stuck with x87 for compatability reasons. But now that the P4 can use SSE2 instead of x87 instructions, it may have a real chance to compete or even win in SPECfp. We've already seen how the SPECfp performance of the leaked chip was remarkably better than that of a 1GHz P3 (517 vs. 327). What we don't know is if those numbers were made using a compiler which was fully optimized for SSE2. Indeed, it's likely that the SPECfp numbers which accompany the P4's release will later be improved upon by better SSE2 compilers.
Furthermore, classification of the P4 as "vaporware" is utterly uncalled for. This is a chip that's remained remarkably on schedule, with the only official delay being one month for a chipset (not CPU) issue. Believe it or not, Willamette is a chip that wasn't supposed to even exist, because by now the P3 was supposed to be replaced by a 2nd-generation, consumer oriented IA-64 chip. (Now *that's* slipping on your schedule!) Plus, Intel has been intentionally downplaying the performance of the P4, which is the exact opposite of what one does with vaporware. While some (who are not knowledgeable enough to study the actual P4 design) have taken this to mean that the P4 will be a disappointing performer, it in fact suggests just the opposite. Intel is not afraid to exaggerate (lie about) the performance of its upcoming processors in order to scare off competition--witness Merced/Itanium. Likewise, Intel is known to downplay the performance of upcoming MPUs in order to surprise the market when the chip finally debuts--they did this, very effectively, with the PPro, which surprised everyone in taking the SPECint lead away from Alpha when it was introduced. Intel leaked that the PPro core was supposed to be a dog; instead it's been perhaps the most impressive core design in history.
In any case, we may not have long to wait. While I'd bet we won't see any final SPEC numbers until the P4 release next month, Intel's presentation on the P4 at this week's MicroProcessor Forum is tomorrow. Will we see SPEC numbers? Who knows? But if we don't, it's likely only because Intel has a big surprise ready on Nov. 20. -
Re:err...
They sure do only have 8KB of L1 data cache. Check it out: http://www.realworl dtech.com/page.cfm?ArticleID=RWT091000000000
-- -
Athlon SMP is delayed to next year
There will be also a SMP version of the 760 DDR chipset, however sources say that the 2 way system will be ready only next year, and the status of the 4 way system is still unknown.
-
Re:Article should read: IBM kills Itanium.
Anyone remember Intel late unlamented 432? The things with Itanium sound awfully familiar...
Or the i860. -
Re:VLIW = Very Long Instruction Word
Sorry to be picky, but all modern CPU evaluate many instructions simultaneously, using multiple-issue systems for many similar functional units of eack kind. The difference with VLIW is that several instructions are held in the same instruction word.
More to the point, most modern CPU's, the chip itself determines which instructions should be run simultaneously. With VLIW, in theory, the compiler does this. Thus (in theory) the VLIW chip can be a lot simpler, since it doesn't need to spend a whole bunch of chip area on complicated logic to manage the process of choosing which instructions to process simultaneously and on large buffers to hold many instructions while the fastest combinations are chosen.
In conventional VLIW the instructions combined in the same word must not have any dependencies on one another. The Crusoe native instruction set is VLIW in this sense. The Itanium system - which they call EPIC - is more complex, allowing instructions that interfere with one another to be combined in the same word. This ought to be somewhat easier for the compiler, but it seems in fact that elements of it are causing problems.
Ah, what happens when theory becomes practice. See, the problem with all of the above is that traditional VLIW (and it is very a traditional idea, despite Intel pretending they just invented it (and even then, it was HP who invented EPIC)) can only produce code which is known to be dependency-safe at compile time. In the fields where VLIW has been used for decades--i.e. specialized DSP stuff--this means most code. In the fields in which Intel is trying to sell its IA-64 chips--i.e. general purpose, multitasking machines (first servers and eventually workstations and even, they claim, desktops)--this turns out to be almost no code at all. It turns out that virtually nothing that a general-purpose computer tends to do has very many instructions which can be proven at compile-time never to have any dependencies on many other instructions. This compares to a normal superscalar x86/RISC processor, which has the much more fruitful task of determining which instructions are likely to be dependency free at run-time.
This presented a problem. So, with EPIC, HP came up with the idea of having long-words (or "bundles") composed of instructions which were merely likely not to have dependencies on each other. So far so good...except that then you need a bunch of hardware on the chip to determine when you have dependencies and when you don't, and to figure out what to do when you do (after all, the chip still needs to execute all those instructions). That is, all the hardware that any non-VLIW superscalar CPU needs. But wasn't the whole point of VLIW was to make the chip simpler by moving all that stuff into the compiler and keeping it there?
Another feature that most modern CPU's (so called "out-of-order designs") have is the ability to pick one side of a conditional branch--the side which is more likely to be taken--and pretend it has already been taken, executing the results before it knows they will be needed. The problem with this is that now the chip needs to keep track of which instructions are definitely needed and which would have to be "thrown out" if it turns out the wrong branch was taken. This would seem to be counter to VLIW philosophy, but because of the speed gains speculative execution offers, EPIC includes it as well. While branching "hints" are indeed introduced by the IA-64 compiler, this is still just more complexity which ends up not being removed from silicon.
Further holding back Itanium is the fact that without dynamic handling of instructions in the CPU, the chip can't take advantage of something called register renaming, which essentially allows the compiler to pretend it's using certain registers while the on the chip the programs just use whatever registers happen to be available. The upshot of this is that Itanium needs a whopping 128 14-ported (a "port" is like a line into/out of the register) 64-bit general purpose registers. More registers--sounds like a good thing, right? Problem is that 128 14-ported 64-bit registers take up such a phenomenal amount of die space that it's difficult to get them to do all the things they need to do in one clock cycle and still stay in sync. The result is that Itanium won't clock very fast without failing, and Intel had to screw with the pipeline timings late in the design process to get it to "acceptable" clock speeds at all.
The end result of this? Amazingly, in a processor whose entire design philosphy is to remove as much complicated control logic from the chip to the compiler, the portion of Itanic's die dedicated to control logic will be bigger than the entire die of an Alpha 21264 on an identical process!! Intended to be a high-volume chip released at 800 MHz in late 1997 or 1998, Itanium is going to end up a low-volume chip released at 733 MHz in 2001, if indeed it is released at all in anything more than engineering-sample quantities.
Thus everyone's hopes have turned to the next IA-64 chip, McKinley. Intended to be released in late 2001 (put me on the record as doubting it) as a high-volume, moderate clock speed chip on Intel's upcoming .13um process, it may indeed be an impressive performer. (Interestingly enough, McKinley is being designed entirely at HP...) Whether that's because of or despite the EPIC design will be interesting to see. It is, at the least, pretty doubtful that the competing Alpha 21364 and Power4 chips around by then won't be even faster than McKinley, and they'll be manufactured on less advanced processes than Intel's.
Bottom line: at this point in the game, VLIW looks like a damn neat idea that never should have made the jump to general-purpose computing. Whether IA-64 will end up dominating the world anyways will be an interesting story to see. On the one hand, the server market isn't known for rewarding well-designed processors: the lion's share of the market is owned by Sun, whose UltraSPARC-II's are horribly outdated and outclassed on a per-processor basis by everything this side of a Celeron. Meanwhile, the Alpha, universally agreed to be the fastest and most elegant processor around, is languishing in the market. On the other hand, Sun's big strength (other than marketing) is in scalability, and given that the two big OS's for IA-64 appear to be Linux and Windows-64 (laff), it seems Intel won't be able to able to compete on this measure either. -
Re:Where's the variations in hardware and software
also, how do you figure that Macs are any less "desinged for this" than x86 boxes?
Because Macs--and yes, even your "quality-built" G4--are terrible at the most important factor for web-serving performance: memory bandwidth. "Apple's systems generally have had only about 60 to 70% of the effective memory bandwidth of contemporary x86 systems. This is due to Power Mac configurations that run the system bus at lower clock rates than comparable x86 PCs, and the simple fact that Apple's system ASICs cannot match the technical excellence of the best x86 chipsets like the 440BX." (source: Paul DeMone's Mac performance article at realworldtech.com)
Furthermore, as you'll learn if you read the rest of that article, Apple refuses to submit any Macs to any standard, fair benchmarking organizations, and in particular to SPEC, instead preferring to use decade-old discredited benchmarks incorrectly (BYTEmark) or make up their own. I wonder why?
the G4 is a damned powerful, quality built piece of equipment--better than most x86 boxes slapped together at some cheap ISP.
First off, it's OEM, but I'm sure that was just a typo. More serious is your perception that the OEM does anything which impacts the performance of the computer other than pick the components. The only thing that could possibly make a computer "quality-built" by an OEM would be making sure everything is screwed in tight. What matters is that the components themselves are quality-engineered. And in the case of chipsets--again, the most important part of a good web server--a plain old Intel 440BX knocks today's Mac chipsets silly.
And let's not even get into the x86 chipsets which are actually built to be used in web servers. Apple simply doesn't have anything to compete.
And there's no reason why they should. Apple has never ever pretended their boxes make good web servers. And considering all the things Apple has pretended over the years, that fact alone should clue you in that they probably don't. -
Re:Still too early to say...
Intel and AMD's 64-bit strategies are remarkably divergent
No kidding. IA-64 is not even using the x86 ISA, while the Sledgehammer will extend the x86 ISA to 64-bits.
Intel will probably come out with an x86-compatible 64-bit processor around the same time AMD introduces theirs.
Actually, so far Intel has not announced any plans to extend the x86 ISA to 64 bits, their IA-64 uses EPIC "technology", and can not natively run x86 code. But, it will be "compatible", but not nearly as compatible as the Sledgehammer, which will natively execute older programs compiled for the 32-bit x86 ISA.
If not, AMD will be well-poised to take over the desktop market
This sort of depends upon exactly how good the Itanium (Intel/HP's IA-64 processor) turns out to be. Remember, Intel has already taken big steps to try and turn programmers toward their new 64-bit ISA, and Intel is planning the Itanium for desktops as well. If it turns out that Intel can manage to get programs desktop users are accustomed to using to be made explicitly for their IA-64 and not AMD's 64-bit x86, then many desktop users might feel pressured to switch completely away from x86. 'Fortunatly' for AMD, Intel has totally blown off Microsoft, and not included them in the development of this at all, so maybe AMD will be able to garner some support from MS.
BTW, there is a lot of info on this, starting about 10 months ago, in the Silicon Insider, located at real world tech. -
Re:The Intel 986
-
Re:The Intel 986