Domain: jc-news.com
Stories and comments across the archive that link to jc-news.com.
Comments · 46
-
Re:Incredible!
you can find them on JC's cross comparison, they're old though
JC's Home Page
Scroll down, on the right, under Benchmarking. -
Regarding K8/Hammer performance features
> the Hammer/x86-64 chips have ondie memory controllers AND more registers than i386++ type
> chips, combined they'll give a speed increase of not inconsiderable proportions.
It is notable to, er, note that the former advantage helps (possibly considerably) towards both recoded AND legacy (eg, normal) programs, whereas the additional physical registers would require recompilation in order to show a benefit, which means that everybody but Windows users will get an immediate use outta that.
Other advantages of the Hammer? Well, not counting the 64-bit yunk (that WILL provide benefits, but I want to cover benefits that will help legacy programs, like Civilization III, The Sims, Unreal II and other antiproductivity applications):
Hyper Transport. That's not much on its own, but it essentially equates to a reduction in loss of bandwidth to the chipset and between processors when you add an additional processor. On the Intel setups, the processors share a set amount of bandwidth to the chipset, so putting eight chips on a 2.4GB/s bus means that you have each chip getting 0.3GB/s. The AMD setup theoretically lets each processor get that 2.4GB/s. Of course, that's in a perfect world, chip-level, but it probably amounts to some benefit. AMD's K7 family has similar advantages, which probably assists in explaining why they get higher performance at each given clock in mainstream applications (which at least somewhat depend on the memory subsystem) even though the AMD cpu to memory bandwidth was usually 2.13GB/s (now it's 2.67GB/s, unless you count stuff like the nForce, which has some extra memory bandwidth, but the extra benefit there is eaten up by the onboard video), whereas the Intel cpu to memory bandwidth was usually 3.2GB/s. Anyway, the idea is that HyperTransport will (on a hypothetical level) make it much easier to make n-way systems without either a tremendous performance impact or an expensive crossbar workaround setup thingy.
SoI. Silicon-on-Insulator. This is one of those things that'll help with the process technology. In the end, it'll probably offer a little bit of a frequency boost by making the chip a little cooler or something like that. I forgot precisely what SoI's primary benefits were. It's been months since I've even thought about it. :)
Stages: As detailed here, the K8 adds two stages to the decoding part of the instruction pipe. The decoding part of the pipe is probably rather complex, so you may see a pretty neat frequency boost over the K7 family without the problem of a huge branch mispredict penalty. The number of cycles that a cpu wastes when it makes an incorrect guess on a low level "if/then" statement is somewhat proportionate to the number of pipeline stages. The AMD K6 and (iirc) Cyrix 6x86 were the mack daddies of branch prediction, since their pipes had only five stages or so, so they only had to wait a few cycles when they zigged instead of zagged. The PIII and K7 had over ten stages, so they had to wait a lot longer, but other advantages (such as the larger and sometimes faster caches and more accurate predictors) in those processors over their predecessors did their best to overcome this disadvantage. The Pentium 4 has a crippling 20 to 28 (depending on the situation, and depending how the trace cache handles the situation, and whether or not you want to count it) stages. This means that it can hit amazing clock frequencies, but it'll get cranky and drowsy for twice as long when it makes a predictive mistake. How does it get away with this? Well, the trace cache does its best to assist, but it didn't really help as much as I think the designers were hoping. But for multithreaded programs and OSes, the SMT implementation on the more recent members of the P4 family, an implementation known as "Hyper-Threading", probably pretty neatly alleviates much of this problem by putting operations from other threads into the cpu whenever the currently running thread stalls on a branch mispredict. The K8/Hammer approach is just to add stages where they hopefully will have the most balanced, beneficial effect to frequency boosting while only minimally increasing the branch penalty. SMT would be nice, but it isn't nearly as critical a need as it is on the P4.
Wider memory access. On the Sledge Hammer, if AMD's plans are still the same as when I wrote this, the memory controller (which is embedded onto the cpu) will access PC2700 memory in a 128-bit configuration (ignore the "126-bit" typo on the linked page -- I can't believe I didn't notice that when I typed it nearly a year ago!), which leads to a 5.3GB/s path to memory. That's damned good, though I really think AMD should have focused on 366/183MHz (equiv to "PC2933") or 400/200MHz (equiv to "PC3200") memory instead of the 333/166MHz PC2700 that came out over a year ago. Still, servers often use memory that's lower than bleeding edge clock in order to maintain reliability, so bleh. Still, 5.3GB/s isn't bad for a setup that isn't based on a shared bus.
Enhanced branch predictor. Well, that's if my notes from a year ago are accurate. If true, this'll probably overcome any mispredict penaly performance disadvantage from those abovementioned added stages.
Larger TLBs, TLB flush filter, etc.. This stuff will have itty bitty advantages on a per-clock performance basis, but every little bit helps.
Larger caches. Hey, I should look this up to see what they're planning on. Is it just 512KB on-die L2, or is AMD planning on bringing it up to 1MB L2? The interesting thing about AMD's designs is that the die is really small on each processor. Remember how AMD has gotten occasional fire for processors overheating? Well, aside from a stupid lack of shutdown diodes in the past, the real cause wasn't that the AMD processors used more heat than the Intel processors. They usually generated about the same amount of heat, often less, but their processor surface area was substantially smaller, which made the chips less expensive to produce and less likely to have defects. But when you try to push an equal amount of heat through half the surface area, you end up with a higher amount of heat per area, which equates to a higher running temperature. The funky thing about this is that you could just added a whopping huge amount of on-die cache. That'd increase performance while also increasing the surface area. But the heat production would not be substantially affected. So you'd end up with a lower temperature processor. So the Hammer will have a higher ratio of cache to processor units in the cpu, so it won't be as much of a fire hazard. Frankly, they should have put 1MB on-die L2 onto the Thoroughbred/AthlonXP. ;)
Crap. I need to research more on the K8. It probably changed a lot since I went into hibernation. The interesting thing is that in the last half year, I've largely moved from being a Windows 2000 power user to a Linux coder (I still use both operating systems for several different purposes, but I'm talking primary usage). I stand to be in the group that benefits most from the Hammer when it comes out, since I'll be able to './configure && make' or 'qmake -project && qmake && make' most of the programs I use and/or develop. So I'll instantly see the benefit of those extra registers. ^_^
-JC
http://www.jc-news.com/ -
The G4+ has ONE floating-point unit
-
AMD has integrityAnd the biggest news: they're calling it the 2600 when it would have been called the 2700 under their old scheme. In the meantime Intel has increased their cache size and FSB speed, so calling it a 2700 would have been a disservice to the customer. They seem to be committed to integrity in the PR rating scheme. Imagine that, a marketing program with integrity. What's next, icicles in Hades?
Hopefully they can undo the damage that Cyrix did, releasing a "PR400" part that was 400 only when compared to a theoretical Pentium with a FSB of 66MHz running Doom, but only had about the performance of a 266Mhz P2 running Quake, which would have made a much more reasonable comparison for the time period.
For a much better discussion of the subject, check out JC's.
Bryan
-
Re:Is it me?
Ha!
You talk about debunking the great megahertz myth and give a link to the Apple marketing department for proof? That's rich. Too bad some of us do things other than run Photoshop.
Here is a good, though aging list of cross-platform benchmarks. The PPC runs about 50% faster than the PIII in normal cases, about 100% faster when Altivec enhanced. Fantastic numbers. I love the PPC. If I could get one without going through Apple, I would.
So why does Apple think they're Lincoln Steffens for giving us the same exagerations as Intel, except from a different angle? If the G5 debuted at 2.5 GHz, that "myth" angle would go away pretty quick.
-
What is it with that name
So is there any legal overlap between the Treo that is a handheld PDA/phone (which could potentially end up with an mp3 attachment) and the Treo that is a handheld mp3 player? Ohhh, wait, I see. The former has an accent over the 'e' whereas the latter's is on the 'o'. As observed by JC's
-
PIII still more powerful than PPC
I posted a list of PPC vs Pentium benchmarks a while back the last time this issue came up that shows the Pentium winning in almost every benchmark (SSE vs Altivec included).
Here's the benchmarks
As you can see, a 400-500 Mhz G4 is not "roughly equivalent" to a 700-800Mhz Pentium or Athlon. The G4 gets spanked in the majority of benchmarks and by large margins.
The idea that the G4 blows away a Pentium is a Mac user fantasy, chalk it up to Steve Jobs. -
Re:The Pentium 4 is worth the extra price."SSE Support: As you've stated, SSE2 code does some really nice tricks. For heavy data processing algorithms"
N.B. Note SSE2 code only applies to Double
precession floating point code.
For single precession SSE/3D Now to the same
jobs. Need Quad precession, your in software
emulation and its real slow.
Despite SSE2, the Athlon still rules at ScienceMark
Intels SSE2 autovectorising compiler still has
a lot of issues for general use.
-
excerpt from JC's. re: Cyrix PR ratingsJC had something interesting to say about this. (www.jc-news.com). Here's an excerpt. Click the link to get the whole thing:
There will be some resistance against this, but let me tell you why the resistance is really there: People were not turned off from the "PR" ratings (hmmm, that's a redundant way of saying it, isn't it? "Performance Rating ratings"??) because they were there or because they failed. People were turned off from them because they did not represent actual performance. A "PR400" Cyrix chip did not run as fast as a 400MHz Pentium II. Heck, it didn't even run as fast as a 400MHz Celeron! In order to find the performance of that particular Cyrix chip, you had to take a 400MHz K6-2 and downclock the chipset from 100MHz to 66MHz (even though the Cyrix chip was running at 100MHz chipset setting). Then, the Cyrix chip only matched the K6-2-400 as a *best case scenario*. In most other benchmarks, said chip did not even outperform equally clocked K6 chips, and in certain nontrivial circumstances, the Cyrix chip performed like much lower frequency processors Frankly, it was insulting to the masses, but it was insulting not because it was a performance rating, but because it was abused.
Currently, systems are sold on equally misleading ratings called "clock frequency". "Clock frequency" is not quite as bad as the abuse that Cyrix gave to their performance ratings, but it is pretty misleading. I'm surprised that system vendors have not been taken to task for customer abuse relating to implication that frequency is an accurate determinant of processor performance between microarchitectures (especially where the Celeron and VIA C3 is involved!). So, you see, I view the Athlon model numbering scheme as being more honest to the consumer than the overly abused megahertz rating.
-
Re:Very disappointed...whatever1. Until the apps are done, any release dates they give you are subject to change. And in standard marketing-speak, when somebody demos an app and claims it's 99% done, it's actually 75%, and if they claim it's going to be done in a month, it actually means 3-6 months. Like the original poster said, the announcements & demos fell short of what was promised.
2. The 733 model is not a new model, and what makes you think the 3-4 week ship time for the new models is wrong? It takes no more than a couple hours for a desktop box to go through the assembly and testing process prior to shipping. If the quoted time is 3-4 weeks, that doesn't mean they haven't built one yet, it means that they are waiting on parts orders to be fulfilled. It's most likely that availability of the new CPUs is poor to due low yields - something that's typically encountered at the introduction of a new processor.
3. I suggest you take a look at these benchmarks. On average, a 500MHz G4 performs like a 700MHz Athlon or 800MHz PIII. In benchmarks that take advantage of Altivec, it performs on the level of a 1GHz Athlon or 1.2GHz PIII. In some certain cases, it performs 2-4 times better at the same clock speed, but in other certain cases, it performs worse at the same clock speed. To me, that means an 867 with the 2 MB backside cache is probably still a little behind a 1.2 GHz Athlon in overall performance and is only faster at a small set of Altivec enabled Photoshop filters and not much else. The fact is that MHz is still the best indicator of performance within any given processor family.
-
Re:Stupidly chosen benchmarks
Wrong. Even granting you your premise:
The first benchmark is valid because it shows how a frequently performed task may be affected by switching architecture.
The second benchmark is valid because users of an uncommon platform cannot expect to find the kind of optimizations for their programs that users of the common platform are used to.
That said, of course two benchmarks don't tell the whole story. For a bigger picture, try this collection of x-platform benchmarks. The stats are collected from many sources, so they're not always based on the same platforms, unfortunately. On the plus side, there's some altivec stuff in there, too.
-
Unfortunately slowing fown for a P4 is relativeLast time I looked at Tim Wilkins Sciencemark test suite, the highest P4 system was in 21st place and was a 1.7 GHz overclocked one. Now sorry to all the folks out there, but Tim's suite is the real world in a type of science application (its his PhD thesis in physics used as a test suite), rather than a bogus set of routines used to pimp a chip.
Once I finish this reply, I will return to the air-cooled T-bird on my dining room table that is being happy as happy can be running at 1.6 GHz. This is with a $140 mobo, a $220 chip, and a $10 fan in a good case. After I get an OS on it, I will run Tim's benchmark. Dare say if past is prologue, it will surpass the 2.2 GHz P4 when that turkey is released.
Sorry to say, but Intel has let the marketing types run their company a bit too long. The blue man group is probably going to be the folks that are blue because their investment in Intel stinks so maybe their use in their advertising is more than appropriate.
-
Re:Intel's Client and the GHZ questionTrue, it really does make it look like the P4 is best for compational chemistry. In fact the Athlon rules at science mark. ScienceMark
Still Intel will be embrassed by the fact that AMDzone is currently in third place in the UD teams rating while the intel group is fourth.
-
Re:Not a chance in hell
What? First to *use* USB (not just put it on the board).
That's an argument about their choice of peripherals, not about their support of i/o standards. It's marketing, not engineering. (Not that marketing is not important, just a different discussion.)
First to use Firewire. Using 32bit Nubus when PCs where using ISA slots.
Still using Nubus years after the PC had moved to PCI. Indeed, I'd count Nubus along with SCSI--in both cases Apple went with a clearly superior solution early on, but ended up being held back as the mainstream PC standards, driven by the much larger marketplace, managed to improve much faster and yet be much cheaper than what Apple used.
The "laughably inferior video card" may be so for FPS, but actually performs quite well for graphic artists. Makes me wonder why they specced it.
"The Macintosh does not have any decent 3d support, so therefore we can pretend that 3d support is not important." Any $9 graphics card is just fine for 2d, although I seriously doubt that 16 MB and a 230 MHz RAMDAC are really good enough for any serious graphic artists. The simple fact is that the Mac does not do 3d well, and that that is simply pitiful in this day and age. And no, 3d is not just used for games; you may be shocked, but there are actually graphics artists that work in three dimensions too! (They use PCs and Unix workstations.)
BTW, the only decent 802.11 system out there that can hold a candle to the AirPort system is the Lucent Orinoco system, which is slightly more expensive and a lot harder to set up.
I don't know how hard it is to set up, but IIRC for what you admit is only a slightly higher price it has a much greater wireless range.
How many makers right now are putting out machines with DDR RAM? Last I checked, not many. Sure they're ramping up, but Apple would be stupid (and possibly insane) to be on the top of the curve for every trend. Their machines would be even more overpriced and they could end up with a Rambus/Intel fiasco on their hands if they made the wrong choice. Better to let someone like Intel make that mistake and fight the battles worth fighting (i.e the ones pretty much won already like USB, firewire)
As this thread was initially about system *performance* (as opposed to capabilities), let me tell you that DDR is MUCH more "a battle worth fighting" on this metric. But you have a very valid point--indeed, I agree with you completely. The thing is, what you're saying assumes that Apple will be designing and validating its own chipsets, incompatible with the real world, every time they want to add a feature. In such an environment, it is indeed not worth it to come out with a DDR chipset now. Moreover, while it would have been worth it to come out with a PC133 chispet a year ago and a DDR chipset in around 3 months time, the fact that Apple is the one designing and validating every new chipset is the reason these chipsets are always a year behind the times--it's a very complicated process and Apple's engineers are understandably stretched thin to try to replicate the work of dozens of companies in the PC world.
That's the problem with having a vertical monopoly; there's not enough room for differentiated product lines and innovation. In the PC world, there are 2 or 3 major chipset manufacturers competing to come out with the fastest chipsets with the most new features, and another couple players who drop in to keep competition high. There are about a dozen major motherboard manufacturers, who compete to best implement these chipsets with the most features at the lowest price. Because the PC RAM market is so large, you have all the DRAM manufacturers in the world driving chipset innovation as well. Finally, because PCs are used for general purpose tasks and because there's an independent benchmarking industry in the PC marketplace, all these people know that they won't be able to get away with a single toy SIMD benchmark as an overall measure of "performance"--thus they all feel pressure to create components which actually work fast over a wide variety of circumstances. Hence the PC market is moving into 2.1 and 3.2 GB/s FSBs while the Mac is finally hitting 1.1 GB/s. Oh, and while we're on the subject, it turns out I was wrong: you won't be able to buy a G4 with on-die L2 cache until the G4+ is released in March. Only then will the G4 finally be approaching clock-for-clock parity with x86 chips (according to SPECcpu, i.e. a real benchmark suite).
Now, I'm not saying there aren't some important tangible benefits to Apple's vertical monopoly. I just don't think they're worth the drawback: machines which cost twice as much as the equivalent PC did when it was released 9 months ago.
One final word re: price/performance -- find a notebook that can compete in that area with the new powerbook. Good luck.
Here you finally have a point: the new powerbook is very impressive and indeed competitive with PCs in price/performance. One important reason why is that AMD has not yet had a viable notebook CPU for the mainstream and performance ends of the market, so therefore Intel has a monopoly over that segment and thus performance notebooks tend to cost as much as powerbooks. Conversely, Apple has seen itself frozen out of the market it practically invented with the first powerbooks, as the portable market becomes more and more dominated by corporate consumers. Thus you have a reversal of the situation in the desktop PC market: Intel is getting away with monopoly pricing, while Apple is heavily discounting to try to break back into a market they've nearly lost.
Still, no matter how I might try to talk bad about it, there's no doubt the new powerbooks are very competitive. On the other hand, the situation is decidedly *not* as Apple has presented it. Here's what Apple has to say on the matter:
Sony Vaio Z505...........PowerBook G4
12.1-inch display........15.2-inch wide-screen display
Magnesium alloy..........99.5% pure grade CP1 titanium
650MHz Pentium III.......400 MHz PowerPC G4
No optical drive.........Slot-loading DVD-ROM
2 hours battery life.....5 hours battery life
Not wireless ready.......AirPort antenna built-in
1.15 inches thick........1 inch thick
$2549*...................$2599*
(Taken from here.)
Now let's look at what the actual facts on that Sony Z505 really are.
First off, let's take note of the fact that contrary to Apple's blatant misrepresentation, the Z505 with a P3-650 actually costs $2250, not "$2549". But what's $300 among friends? Well, we can use some of that money to buy the Z505 a 6-hour battery, so hahaha on you. The cost is now $2450, or $150 less than the Mac. Also while the powerbook may be a miraculous 3.8 mm thinner than the Z505, the important measure is of course weight; the powerbook, at 5.3 pounds, is 41% heavier than the 3.75 pound Z505--which makes sense, as they really serve different purposes. Indeed, the low weight (and its huge popularity) is the reason the Z505 is so underpowered for its price (for a PC that is), but we'll disregard that for now.
Unfortunately, there's no way to buy the Vaio as unloaded as those powerbooks: in particular, no way to buy it without at least Word 2000. Nor is there any way to purchase Word 2001 with our brand new powerbook at the Apple Store. We could buy it from MS for $400 but that doesn't seem quite fair. Instead we'll upgrade both machines to Office.
Where does that put us now?
Sony Vaio Z505...........PowerBook G4
12.1-inch display........15.2-inch wide-screen display
Magnesium alloy..........99.5% pure grade CP1 titanium
650MHz Pentium III.......400 MHz PowerPC G4
No optical drive.........Slot-loading DVD-ROM
6 hours battery life.....5 hours battery life
Not wireless ready.......AirPort antenna built-in
1.15 inches thick........1 inch thick
12 GB HD.................10 GB HD
3.75 pounds*.............5.3 pounds
$2650....................$3060
*Longer battery adds weight from this original measurement, but I couldn't find out how much.
What's missing? Well, the DVD player, for one thing. An external one adds $400 to the Z505's cost, making it just a hair cheaper than the powerbook. The 650 MHz P3 is in reality a good deal faster than the 400 MHz G4, but by using the right programs an argument can be made that the G4 comes close. "AirPort antenna built-in" is a red-herring, since you still need to spend $100 for the AirPort card. I looked it up, and the first place I checked had an Orinoco card for $160. Again, I'm almost positive this card has much better range than AirPort. Eh, let's look it up, shall we? Well, AirPort only goes a measely up to 150 feet. Orinoco goes...let's see...up to 1750 feet. Hmm. Guess the "built-in antenna" isn't working too well, is it??
So what do we end up with? The new powerbook is almost exactly the same price as a similarly configured Z505, except that the Z505 has a tad more HD space, has an extra hour on the battery, and, sorry to say, is the faster machine. Alternatively, you can get the Z505 without a DVD player and save $400.
Meanwhile, the powerbook has a luscious 15.2" screen, while the Z505 is stuck with a 12.1" which, while quite small, at least manages to almost hit the resolution of the powerbook (1024x768 vs. 1152x768). The benefit of giving up the nice screen and the internal DVD is up to 1.55 pounds of heft and of course that extra hour.
In other words, it's arguably a tossup. Of course it's a bad comparison because one is a sub-notebook and the other a full-sizer, but Apple chose it, not me. Still, it's worth noting that the Z505 is perhaps the most overpriced laptop around, so it's not such a surprise that Apple chose it when making a comparison.
Well phew! Aren't we enlightened? Did I pass? (It wasn't that tough, I let Apple "find a notebook that can compete in [price/performance] with the new powerbook" for me!)
Now it's my turn: find a desktop Mac that can compete in (price/1.5)/performance with a similarly equipped desktop PC--and I mean in a wide variety of benchmarks, not just Photoshop and RC5. (Indeed, it would be tough to do that even with Photoshop, assuming one actually used a complete Photoshop benchmark like PSbench.)
Good luck. Unfortunately, there are very few good cross-platform benchmarks to consult; the most well-respected cross-platform benchmark in the world, SPECcpu, shows the G4 in a rather unflattering light--indeed, because of this Motorola hasn't even released official scores for the G4, making it the only current general-purpose CPU family I can think of for which SPEC scores are not available. Oh wait, I lied: there's no SPEC scores for Cyrix chips either. However, there are SPEC scores for the P3, P4, the AMD K7, for Sun's UltraSparc II and III, for IBM's POWER3 chip which is sorta related to the G3 kinda sorta, for the Alpha EV67, and the MIPS R12000 and the HP PA-RISC 8600-just in the past year. The point is, every real chip releases SPEC scores, usually early and often. The best we have for the brand-spanking-new G4+ is an *estimate* for the outdated (in fact retired) SPEC95 suite, and man it's not too pretty. Of course, Motorola can always complain that they don't have a very good Fortran compiler, which is key to a good SPECfp score (their SPECint score sucks too, though); still, this is no one's fault but their own, unless of course they never meant the G4 or G4+ to be a high-performance general-purpose chip (oh that's right, they didn't; they built it for the embedded market).
Other cross-platform benchmarks are invariably much less trustworthy, because they are almost always binary only and are never of the breadth or depth of the SPEC suite. Picking Photoshop, for example, is just plain dumb, as Photoshop is simply better optimized on the Mac than on the PC (alternatively, we could benchmark Word and see which runs it faster). There's a nice collection of published cross-platform Mac vs. x86 results here; it's worth perusing, even though most of these programs make *very* poor overall benchmarks, taken as a whole they at least provide some semblence of a big picture. Needless to say, I think your task will be pretty difficult, even if there were a good way to compare performance across the two platforms. -
Re:Confessions of a former Mac User
-
Dell pulls P4 system out of shootout vs Athlon DDR
Jc is reporting that Dell pulled their P4 system out of a "shootout" vs the DDR Athlon systems after looking at their performance. He also has numerous other links to 760-land. It is really looking like Intel had better dump Rambust and get with the program if they are ever going to sell anything with the P4. Likewise, Dell had better re-consider its Intelicide policy and start making AMD machines, especially when the multi-processor version of the 760 goes commercial. They have held the server market because Intel was the only multi-processor game in town. This is going to change soon bigtime.
-
Re:You have way too much time on your hands, frien
First off, apologies for slipping in ad hominem attacks in my post. However, this was just in response to your similarly inappropriate attacks on Johan and Ace's. The difference, of course, is that my comments were in support of the correct analysis, not disparaging it.
This is not a site on which every little review and rant is meant to be posted. Hannibal's article belonged here because, well, Hannibal is an expert on the technology behind microprocessors.
1) It was Hemos' decision to post this; anyone can submit anything they deem worthwhile.
2) This was neither review or rant, but rather a lengthy and insightful look at some subtle but very important issues that will influence P4 vs. Mustang performance. Just because you've never seen anything on the web supportive of the P4 doesn't make a balanced piece a rant; it just means that you've been reading a lot of ignorant writing.
3) Humorously enough, the "self-promot ing" Hannibal link I offered was exactly "every little review", this time of some gimmicky portable (but monitor-less) PC. I found it entertaining, and was happy to see it on /., but it was the very definition of a fluff piece--like much of /., now that you mention it.
4) Hannibal is NOT an MPU expert. He himself will acknowlege this, and has in his articles (don't have time to find where). Email him yourself and ask him who is more of an expert, himself or Johan De Gelas, and I am relatively certain he'll say Johan. If not, he will readily admit that Johan is at least his equal and that Ace's is a much more technical site than Ars. And he will most certainly admit that Paul DeMone is 10 times the expert he is. Again, I really really like Hannibal's work, it honestly inspired me, and I submit every new Hannibal-on-architecture article to /. But he is just a student, not an expert.
And, just FYI, I have read every single article on microproccessor design that has passed by the /. pages for two years, plus linkage from several other sites, and a few print articles (though I no longer like to touch paper. How primitive...). I could easily look up links, hell, just by using the search features on /. and Anand.
ROFLMAO!
You read the scant handful of poorly chosen architecture articles linked from slashdot and you consider yourself an expert??? HAHAHAHAHAHA. Oh--and sometimes you check your facts with little old 16-year old Anand.
Look dude, it isn't my place to criticize you for not knowing as much about MPU design as I do. It is my place to criticize you for not realizing that there is much more to be known, for not realizing that many people do know more about it than you. I am certainly no expert--I'm just a college student--but it is blindingly clear that I know more about it than you, just as it is clear to most /.ers that they know more about computers than, say, the guy who says he needs to go out and buy more "RAMs" because the new game he just bought says it requires 250MB of free space to install. (Don't worry--you're not that bad, it was just an analogy. ;)
Second it's quite clear that you essentially skipped all the parts of the article you didn't understand and concluded that if you--with your expert education on MPU design from /. and Anandtech--didn't know what was going on, it must be "IYNSHO, fluff masquerading as technical writing". Unfortunately, your opinion, humble or no, does not apply here: it is indeed a fact that this piece contained several new insights, and synthesized information which was not easily available in other forms. This may not meet your standards of being "more than a book report," but it is certainly meets those of technical writing. Obviously Johan could not hope to benchmark the new P4 or Mustang cores, as they are not released yet; still he managed to include some insightful benchmarks which demonstrate the points he was abstractly discussing with ample clarity. (Of course, if you're used to looking at MPUs as mysterious black boxes, then you might wonder what rehashed K6 benchmarks are doing in a Mustang/P4 article.) If you truly believe that this article included "nothing unique", why don't you post just one article detailing the issues I raised in my previous post? Since you've obviously read such an article yourself, MPU expert that you are, it shouldn't be too difficult to dig up a link, even without resorting to "the search features on Anand." (LOL!)
No, Johan didn't take what might be called the "Hannibal route"--i.e. launch into an exploration of the overall design philosophies behind the two cores--because he is writing for a specific audience, a knowledgeable technical audience who can be expected to have read several pieces explaining the important design features of the P4 (not much concrete is known about the Mustang other than that it will be a K7 with tweaked layout to improve critical path and power consumption, and that it may recieve several other enhancements as speculated in the article), specifically those here, here, here, here, and here. Not only have most regular readers of Ace's read all these articles, but they have followed some very interesting debates on them between industry experts on the Ace's tech forum for months now. It might be fair to criticize Johan for submitting an article which clearly assumed such a technical background to /. (although in fairness he includes a link to his earlier, more general P4 article in the very first sentence); of course, it's /. who decides what to post on their own site, not Johan.
Re: preproduction benchmarks, the Tom's piece on the PII and the Firingsquad piece on the K7 were generally the only benchmarks available of the respective chips before their launch. If you followed MPU design news as closely as I do you would know this. There is a thing called an NDA, after all; as these two pieces demonstrate, both Intel and AMD like to make sure that those who choose to break theirs post erroneous information.
You're of course right that a PPro was indeed superior to a PII at a given clock speed; if you look through the article itself instead of just relying on the concluding quotes I posted, you would find benchmarks which clearly understate the known performance of the PII by as much as 30 or 40%, though. There is no doubt Tom's preproduction benchmarks, like Firingsquad's, were horribly off. And as long as you're disputing my "always" contention--I've ponied up the links (and no, it didn't take me very long at all, because I, having followed MPU developments for a couple years, knew for example that it was FS with the bad preproduction K7 benchmarks, and Tom with the PII controversy); why don't you post a single pre-NDA "review" or even just a series of leaked benchmarks on a new x86 core which proved entirely accurate?
Re: definition of a 7th-gen core: You really should pay more attention in class, boy, because I'm schooling you right now. I explained what I meant right after that comment in the original post: The Athlon is a new core designed to scale well to very high clockspeeds. Just like the Willammette. That's why the Willamette performs slower clock-for-clock than a theoreticl P!!! at the same clockspeed
First off, there is no evidence that the P4 has lower IPC than the P3, except for preproduction benchmarks and some ambiguous comments from Intel VPs. If you read my previous post at all, you would realize these would tend to indicate that the P4 actually has higher IPC, not lower. On the other hand, the main evidence that it has higher IPC is that an analysis of all the new, innovative, braniac features of the core strongly indicate that it must.
And second off, you couldn't be more wrong. By calling the K7 a "7th-gen core" you are obviously comparing it to the 6 previous generations of Intel cores. Each of them was able to improve both clock speed on identical process and IPC significantly over the previous generation. The Athlon beats the P6 in clock speed on identical process...but only narrowly: the Athlon sweet spot right now is around 1 GHz on Dresden's .18 Cu process; the P6 around 750 MHz on a .18 Al process. Intel's process is probably slightly better except for the large Cu vs. Al gap, so we can be charitable and say that, on identical processes, the K7 clocks 25% faster in an untweaked core than the P6 does in a much-tweaked core. Indications are that the Mustang/Palomino/Morgan K7 tweak will reach 1.5 GHz on .18 Cu, so perhaps 35-40% better on equivalent processes. As for IPC, the P6 and K7 are essentially equal. Indeed, this is being generous to the K7, as the P6 knocks it all over the place in the fairest cross-platform bench there is, SPEC. Yes, this is because Intel's in-house compiler group is better than AMD's...but the compiler is arguably just as much a part of a core as the silicon itself.
Meanwhile, P4 roadmaps indicate that it will scale 100% better than the P6 on identical processes, and the analysis of Paul DeMone, a far greater MPU expert than you or I could hope to be, is that it will have 15-20% better IPC for integer work, and considerably greater gains for FP. (It's too soon to tell without knowing more about how well compilers will optimize for SSE2.) That would be a 7th-gen core worthy of the leap from 5th to 6th which the P6 provided.
Again, don't get me wrong: the Athlon is clearly clearly superior to the Coppermine P3. But only by about the same degree as the Coppermine P3 was superior to the Katmai P3. That is, *not* by a full "generation"--whatever the hell that is.
Re: important innovations in the K7: It's a huge innovation in the x86 world, something Intel hasn't done since the PPro days. First of all, the EV6 bus is new to x86 and a huge innovation, it is superior in every way to the old Intel GTL+.
BWAA HAA HAA HAA HAA HAA!! Man, I'm rolling on the floor and crying that's so pitiful.
Oh, but I'm being rude. Ahem. Pardon me. You, uh...you do know why it's called the EV6 bus, don't you? ...Even though that name happens to be shared by the current generation of Compaq Alpha MPUs... Or wait; actually the official name of the current Alphas chips is 21264; it's just that they like to code name their core variants things like "EV6" and "EV67" and the upcoming "EV68". Based on what, again? What's that? Based on the code name of the current Alpha platform???
HAA HAA HAA HAA HAA! I asked you to pick one innovative feature of the K7, and you picked the one feature that AMD DIDN'T INVENT!!!!!
Ok, I'm over it now. Phew.
Right. AMD didn't invent the EV6 bus. They didn't help develop it. They in fact had nothing to do with it. They licensed it, wholesale, lock, stock, barrel, from Compaq where it has been in use for quite some time now. On the one hand, it was a good business decision because Intel had just clamped down and decided not to relicense the P6 bus (not really called GTL+ BTW, but don't worry it's a very common mistake) to AMD, and rather than take the time to reinvent the wheel (and thus delay the launch of the K7), AMD decided to go shopping at Compaq. Fine. Smart decision.
Don't give me any of this revisionist history that they did it because it's 200 MHz, though. The K7's extra FSB bandwidth (courtesy of the EV6 bus (and the engineers at Compaq, not AMD)) has up till now been entirely wasted as it is paired with SDR SDRAM (1.6 GB/s FSB, only 1.06 GB/s from DRAM)--generally paired asynchronously with PC133. If it were any help at all, don't you think the Athlon would be winning and not losing in FSB-intensive benchmarks like Q3? Meanwhile, it's a huge waste of pins and power--as well it should be, since it was originally designed for $10,000-50,000 workstations and servers, which, frankly, can afford the extra mobo costs, power supplies and electric bills.
Now, of course this extra FSB bandwidth will finally be put to good use with the advent of DDR mobos for the K7, *finally* starting early next month (fingers crossed!); latest news is the 1st. DDR mobos for the P3 will show less improvement because the P3 is stuck at 133 MHz FSB. Fine.
But this isn't why AMD chose the EV6. Indeed, when they made that decision, the DDR standard had either barely-just-been or had-not-yet-been determined by JEDEC. Intel was set to steamroll RDRAM into every PC, and there was little to no indictation that DDR would ever be a volume part in the PC industry. (It'd be used in servers and such.) AMD chose the EV6 because they *had* to, not because they wanted to. It's a great bus when doing what it's designed for--connecting specially made (quite expensive) double-wide SDRAM to Alphas, at FSB freqencies up to 466 MHz. But it offers little to no performance benefit in the here and now for the K7. And as for DDR and high-speed buses, Intel will be releasing their Tualatin revision P3's in Q2 with a 200 MHz FSB, in time for Almador, their (maybe--legal issues with Rambus...) DDR P3 chipset. So yes, the K7 will be first with decent DDR support in the x86 space. The P4's dual-RDRAM chipset and 3.2 GB/s FSB will be faster, though more expensive, as far as memory performance goes, though.
But calling a bus that AMD had exactly zero nada zilch nothing to do with evidence of their design innovation gets an extra HAHAHAHAHAHA from me.
More on AMD's innovative EV6 bus: it's a huge boon for multiprocessing, with the 760MP coming to retail very soon according to Anand.
Unfortunately, according to AMD's Q3 earnings report Investor Conference Call 2 weeks ago (I was listening; somehow I doubt you were...), the 760 MP has been delayed to at least Q1, possibly Q2. They played it off as strategic reasons (business demand down; no major deals with the big 4 server OEMs (Dell, IBM, HP, Compaq) for AMD in the enterprise lines), but considering they only had one 2-way system--behind closed doors and not running anything--at MPF it looks as if their engineering is behind too. On the one hand, too bad, because point-to-point beats shared bus any day. On the other, there's a reason why Intel went with shared bus, and it's not because they'd never heard of PTP. It's, well, easier to implement. When doing the right thing takes over a year longer, it sometimes becomes doing the wrong thing. (Not that I believe that's true here, but it's worth taking into consideration.)
Now, the countryside is littered with Athlon clusters crunching numbers for the scientific community in places where they'd never have considered using a P!!!.
First off, scientific computing is such a niche market as to have absolutely negligable impact on the bottom line of either company. The idea that AMD designed the K7's huge-ass FPU--thus taking up vital die-space--for the lucrative physicist market is laughable. It's an unbalanced design, plain and simple. Second, last time I checked, most scientific computing was being done either on Alphas or on Beowulf's of Celerons. Now, I don't doubt that K7's are moving heavily into the mix; if I was doing scientific computing, I would go with a cluster of Durons in a heartbeat.
But do you really, honestly, think that when AMD decided to go with the 3-wide FPU there were dreams of meteorology and electron potential modeling spinning in their heads? Me either.
And your "analysis" of the supposed advantages of the Coppermine's cache over the Thunderbird's are positively laughable. You see, in the REAL WORLD people don't run benchmarks on their boxes all day. They run apps and processes, usually several at a time. That's why the Athlon's cache is superior--you can keep more in it instead of swapping to system RAM, which is a MUCH BIGGER HIT than having a small amount of cache latency. In the REAL WORLD, the Athlon's cache architecture makes sense, not in your fantasy where we all run CPUmark all day.
Uhhuh. That's why the Katmai P3--with its half-speed 512 Kb L2--was so much faster than the Coppermine? That's why the Athlon "Classic"--with down to 1/3-speed 512Kb L2--is so much better than TBird??
You think it's faster to perform a context switch with a 64-bit bus to L2 than a 256-bit one? Golly, imagine how slow the P4 with its 48 GB/s bus to the 5-cycle latency L2 will be!!
Furthermore, in case you'd forgotten, all these chips operate at over one billion cycles per second. Multitasking occurs at much higher granularity than this, and even if your analysis were right (it's not), the effects of multitasking are invisible to a chip to a second or third order of approximation. The effects of a 7 (or 5!) cycle L2 vs. an 11 cycle one most certainly are not.
And then you go on to say that the K8 will just be a derivative of the K7. You really aren't paying attention at all. The K8 Hammer architecture is completely new, not only extending x86 to true 64-bit while retaining backwards-compatibility with 32-bit and 16-bit code, but adding huge and significant architectural innovations. Go read about it, dimwit, before you guess at what it is. Lots of documentation has been released--even just a quick scan of some Slashdot search results will make you a lot more knowledgeable about it than you are now. Geez...
No, this is false. It is by now quite well known that what will define the Hammer family will be just a simple extension of the x86 ISA to 64 bits--an extension which will have essentially no use for the average PC user, but rather only for those who need 64-bit integer precision (CAD, etc.) or >32-bit memory address space (database, etc.). In addition, "the K8"--that is, the Sledgehammer, aimed at the enterprise market--will feature 2-way CMP and AMD's new Southbridge standard, LDT. Ho-hum. Nice features (LDT has no place in the PC either, though), but nothing extraordinary, especially considering it's not due until early 2002. In addition, there has been mention (Sanders mentioned it in an interview) of another K8 variant called Clawhammer; speculation is that this is a PC version of the K8, although it's not known what, other than x86-64, will differentiate it from, say, Palomino.
If you actually believe the K8 represents an entirely new design, then it may be that your news was correct but just a little (about 12 months) late. The K8 was indeed scheduled to be a ground-up clean-sheet kick-butt design, but was radically scaled back by Sanders less than a year ago. Head Designer and impressive guy Atiq Raza quit around a year ago, following this decision, and the hopes of a truly innovative K8 went with him. Of course, evolution is often better than revolution in the MPU industry--eg. RDRAM. If it can manage to position itself against Itanium, the K8 might look very strong. (Of course, McKinley will be on the way by then, and it's considerably less of a joke.) But claiming that the K8 is a revolutionary new design is plain false.
As for the recent /. articles on the K8, they have all been, IIRC, about the recently released x86-64 simulator to help Linux, etc. port to the new ISA. This has nothing whatsoever to do with the design of the K8 itself--an x86 simulator could help "port" Linux to either a P5 or a P6, although they could not be more different architecturally--just the ISA.
Re: P4 moving into heavy volume in Q3 2001: God, you are a shameless, and dim-witted, Intel apologist, just as I suspected. Q3 2001?
How precisely does this make me an Intel apologist? Is it too early?? It is a known fact that Intel's roadmap moves the P4 solidly into the mainstream category in Q3 '01 with the introduction of the Northwood P4 on a .13 Cu process. Northwood will allow Intel to get good yields at >2.0 GHz, and, more importantly, takes up a much more reasonable die space for mass production. Just as important, its release will coincide with the release of the (hopefully DDR) SDRAM Brookdale chipset, which ought to move the P4 out of the quite-high end where it will be stuck with the dual-RDRAM Tehama chipset. (3rd party DDR chipsets may be out for the P4 before then, but probably not in much volume before Q2 at the earliest.)
Is it too late?? The latest Intel roadmap shows the P4 moving to the upper end of the mainstream category in Q2, but I believe that to be a lie by Intel marketing, eager to cover up the fact that they essentially have no upper-mainstream product from now until Q3 2001, a hole in their product line a mile wide. (Am I still an Intel apologist?) Indeed, this is the reason I just bought AMD stock very recently, and have been encouraging my INTC-owning relatives to sell ever since, well as it turned out, just before the peak late this summer. And yes, like you too I am generally appalled by Intel's heavy-handed anti-consumer tactics--suing VIA and refusing to release a PC133 chipset in a lame attempt to force RDRAM down the industry's throat; paper launching the 1 GHz P3 6 months before even limited volume was available, the 700-850 MHz P3's before it around 3 months early, and the 1.13...oh the 1.13...all in a lame attempt to pretend the P6 could keep up with the Athlon; bribing Michael Dell with special pricing and all several dozen GHz P3s available this summer to spread libelous statements to the media in a lame attempt to disparage AMD's products; spreading IA-64 FUD in a lame though successful attempt to scare designers of competing RISC chips to delay (Compaq, HP) or eliminate (MIPS) their next-gen chips; keeping the Celeron FSB clocked at 66 MHz and "single-processor only" in a lame attempt to...be lame.
Don't worry, I dislike Intel plenty a lot. I cheer for AMD, and make no bones about it.
What bothers me, though, is that, having been on the Athlon bandwagon since summer 1999, when I first read analyses of how the K6's poor scaling was due to architecture not process quality, and how the better balanced K7 had the chance to scale even higher than the P3, I've seen how this position has gone from being contrarian, well-informed and far-sighted to the position of a growing mainstream of ill-informed buzzword-spouting reality-ignoring AMD fanboys. No, not you; the people I'm talking about are much much worse (and hence not nearly as able to fool /. with uninformed arguments). What's even worse, though, is that several influential tech sites employ writers not much more knowledgeable than you, and they spout the same pro-AMD propaganda day after day after day. It's not that I dislike seeing anything pro-AMD or anti-Intel; indeed, exactly the contrary. It's just that I like it to be true.
Plus, AMD's execution with the K7, while quite good, has been well short of the claims that I and many others were making for it over the past year. The benchmarks have been disappointing. There's only so much excitement you can get out of awesome benches in 3DSMax and ViewPerf before you notice those Q3 and Content Creation scores just aren't going to change. (Yes, I know CC is Intel-biased. Whatever.) Thunderbird in particular was a huge disappointment, offering gains on the order of 3-5% over Athlon Classic while the Coppermine P3 beat Katmai by 10-15% (it's that 64-bit vs. 256 bit L2 bus). MP has been MIA for months now. The K7 laptops are late as well; high power-consumption is the price you pay for unneeded FPUs.
Having read the Willamette articles I've now referred you to twice (the DeMone ones on RWT), having seen Paul defend his unorthodox position on the Ace's tech boards for months now, basically skewering even very well-informed arguments on the AMD side, I've gradually become convinced that the "web hardware community" is greatly underestimating the P4's performance. So have many people much more knowledgeable than me--including the formerly (and still, though less so, IMO) AMD-biased Johan.
I usually go around looking to argue with P4-bashers who seem intelligent and well-versed in the technology, because they give the most interesting arguments and are the most willing to learn. Unfortunately, I too often have to correct well-meaning but misleading posters like yourself, who ignorantly pass on the same-old wishful thinking and oversimple analysis as fact.
I like AMD. I really do. I want them to "win", inasmuch as I want anyone to. I really do want them to stay very very competitive, like they are now. (And to make me lots of money!) But I just don't think it's helping them, or helping the truth, to pretend that the K7, a largely derivative design, will be able to keep up with the radically innovative P4 for very long. And I don't think it's furthering the principles of beauty and elegence in design--which is what really interests me in this stuff anyways--to call an insightful and fair (I thought it strongly gave AMD the benefit of the doubt, BTW) analysis of the strengths and weaknesses of the P4 and Mustang designs "an ignorant fluffy rant", or whatever you said.
I won't expect the apology from you, but you have my email address if you should want to send it. Meanwhile, if you're really interested in MPU design, please read Paul's articles at RWT; they're fabulous and take everything to a whole new level. And if they must be anti-Intel, you can't do better than his Merced/Itanium articles, here, here, here and here.
Also you should check out the tech forum at Ace's, and the very AMD-biased but usually literate and often a great site for news and links...JC's. Plus the usual suspects: Tom's, Ars, The Register for juicy-and-occasionally-even-true rumors. You could learn a lot, and trust me, it's fascinating stuff. -
Asymmetric Multi Processing?JC's run a story about the AMD Reseller Confrence a while back. What really cought my eye was the following qoute (on the second page): "Also, according to the tech guy the multi-processor boards will be able to use processors with differing speed grades (i.e. a 700MHz and a 900MHz processor running on the same board simultaneously)".
Can someone who has knowledge about operating systems, especially process cheduling, comment that a bit? Can users be sure that the most CPU hungry thread/process is run on the fastest CPU?
--
-
Re:SMP vaporware
A couple of quotes from recent AMD Reseller Conference
:
"A couple of other interesting things from the conference. According to the technical presentation the multi-processor board will be able to support only 2 processors on the Northbridge. We were told that the current Athlon chip has no issues with multi-processor functionality, the issue lies in the way that the Northbridge chipset was designed. Also, according to the tech guy the multi-processor boards will be able to use processors with differing speed grades (i.e. a 700MHz and a 900MHz processor running on the same board simultaneously). AMD will use the LDT bus to connect multiple Northbridge chipsets to allow multi-processing with more than 2 processors."
"As was pointed out previously, the North Bridge only supports 2 CPU's, however, the LDT bus supports multiple North Bridges, so 4, 8, and 16 CPU (and theoretically more) configs are possible one the LDT bus chipsets come out, mid 2001..."
"The LDT bus runs at 800Mhz, and 1.6Ghz clock rates, which means it can move 6.4 Gigabytes of data per second each way, 48 times the bandwidth of, and at a lower latency then the current PCI bus..."
"Based on their Diagrams, the CPUs' will communicate to the 'memory hub' across the EV-6 bus (at 266Mhz, moving to 400Mhz eventually), and the memory hub will communicate to peripherals across the LDT bus, which includes the I/O Hub, at 800Mhz (moving to 1.6ghz eventually)" -
1 Thz Processor Announced!
Coming soon:
1 Thz Processor Announced! -
Re:Err, PowerPC? AMD/Intel/Via 1 GHz Smoked?
Jobs' presentation provided a Photoshop (TM) shootout between a dual-processor 500 MHz Power Mac G4 and a single processor 1 GHz Pentium. As expected, the PowerPC finished the test in about half the time it took the PC.
That's because the test was, as expected, rigged. That is, it only used a certain set of filters which happen to run faster on PPC than on x86. It would be quite easy to pick a different set of filters and "show" that the PIII is faster than the G4 clock-for-clock on Photoshop. (Not to mention the fact that Photoshop is perhaps the only mainstream program better optimized for the Mac than the PC.)
A fairer Photoshop benchmark (and using Photoshop as your sole benchmark is pretty shortsighted, to say the least) is PSBench, which runs not 3 specially selected filters like Steve did, but a full 21. The results? A 500 MHz G4 is a bit slower than an 800 MHz P3. A dual 500 MHz G4 is probably not much faster than a 1 GHz P3, and certainly no faster than a (cheaper) dual 800 MHz P3.
For a rather exhaustive look at G4 vs. x86 benchmarks, try here. The upshot? A G4 500 is maybe as fast in raw integer and FPU speed as...a PIII 400. That is to say, the G3 was about equal with the PII clock-for-clock; however, the Coppermine PIII's have since added some stuff which the G3/G4 can't match--namely, a much faster L2 cache and 133 MHz FSB.
Where the G4 really shines, of course, is in those programs which can take advantage of AltiVec--and indeed, those are about the only benchmarks you'll find on that page. (You won't, however, find any gaming benchmarks, because the Mac would of course be "unfairly" limited by its lack of good graphics cards.) In raw SIMD-plus-FPU, a 500 MHz G4 performs about as well as...well, it depends, but a fair guesstimate would be a PIII 750 or an Athlon 650. If you look at the page, you'll find that the Mac wins quite a few benchmarks, and that one or both of the x86 chips wins most of them, and that the margins of victory vary widely.
Suffice it to say, though, that even if you do run Photoshop all day, the performance of Apple's hardware is not a good reason to buy a Mac. With the exception of Seti@Home and RC5 (but not OGR!), there is a significantly cheaper PC which will run any program faster. This isn't to say there aren't other good reasons to buy Macs. But when one platform's top chips double in speed in a year, and the other's only go up by 50 MHz, you can bet that the first platform is going to be faster. -
Computers as religion
Make no mistake, apple is a cult. You don't break the rules, you transgress against the will of the founder.
I understand that mac users are forbidden from visiting http://www.jc-news.c om/parse.cgi?pc/benchmarking/xplat/ppc-x86 and learning the truth about how the g4 really stacks up against the p6 and athlon.
Those who break the rule are required to say 10000 "hail steves" and an "our founder".
--Shoeboy -
Re:Just wondering...Are there any numbers on performance versus Intel Linux based systems?
This page should help you compare the different platforms... it doesn't come from Apple, though. So it might shock a bit.
The Linux to avoid is MkLinux, since it uses an older Mach kernel that serves to slow everything down. You don't need Mach to run Linux. It's the only one that does support the older ppc 601 nubus machines.
blessings, -
Re:Low clock at first is Intel SOP
Why is this done? I believe Intel knows they can get people to buy things at the speeds and prices they set. [Last I checked, they're still making a profit on CPUs, so this strategy is still successful.]
Actually, in the last quarter more then half of their profits came from the investments and not from their "core operations":
http://www.jc-news.com/pc/?peek=20000718
So maybe in the next few years Intel will transform from the tech company to investment firm :-) -
Re:another article
And here's one more : Rambus Impressions, From A Technical Standpoint
-
Re:Actually, The K6-3 was stopped a few months ago
Quite true, but not for the reasons you claim, there was ALSO supposed to be a K6-3+. Guess why it was canned? I'll let a much smarter guy put you in the know:
http://www.jc -news.com/pc/index.cgi?search=K6-3+cache&peek=2000 02
Check the second article on the page; it explains the problems AMD has been having quite succinctly...
To try to give you a full story, the K6-3+ has been on and off the AMD roadmap for a couple of months, it is currently on BUT there is no sign that these cache problems have been solved...
-----------------------------------
Jeff Coulter
Geek in the clouds
Virtuoso - Smart Personal Agent
Jeffcoulter@users.sourceforge.net
ICQ: 33011156
-----------------------------------
"He who will not reason is a bigot; he who cannot
is a fool; he who dares not is a slave."
- Sir William Drummond
-
They were air-cooled.
I agree that stability seemed to be quite a problem--they didn't mention any testing they did, it seems that they only ran it for bragging rights , even if the processor was probably really unstable. I read at JC that they were air-cooled.
-
I found more AMD stuff
Acording to jc-news.com the processor was actually a athlon thunderbird and the cache is on die. But I also found some confilicting stuff at a very nice current an future processors page thats states that the on die cache is only going to be on the spitfire. Wonder who is right?
-
Why L1 is faster than L2
On the Coppermine (PIII 'E'), when the processor asks for a specific portion of memory, if it's in L1 then it takes 3 cycles to be retrieved. If it's in L2 instead, then it takes 7 cycles (or so?) to be retrieved.
That's basically the difference. They both 'tick' at the same clock rate, but one just happens to be able to deliver data in less than half the time.
This is why I'm always pissed at people who ignore every other factor when they refer to "full speed" cache. I mean ... if you had L2 cache running at 800MHz on an 800MHz processor, but that L2 cache was only 8 bits wide and took eighty cycles to retrieve a piece of memory (eg, making it probably even slower than SDRAM), should you really refer to it as "full speed"?
-JC
PC News'n'Links
http://www.jc-news.com/pc
PS: Apologies ... in an earlier post, I referred to Willamette's bus as being 128-bit. I was very incorrect. The correct width is 64-bit (incidentally making Willamette's bus less cool than I thought). -
Re:what does any of this have to do with Crusoe?Actually, it's simpler than even that. All Northwood is is the mobile version of the Pentium IV on a 130nm process. Saying that Intel is whipping this up in a panic is silly since this is a perfectly natural evolution in their roadmap.
We've known about their 130nm process, and we know that all their x86 chips get put into a mobile format. Also, we know that Willamette's core will be release a la the Pentium IV product very late this year.
FWIW, the Willamette is Intel's first new core since the P6 back in the mid 90s, which found itself inside the Pentium Pro, II, III, and Celeron. Rumours and admissions declare the below to be its likely improvements:- A deeper (or smoother, at least) instruction pipeline for stronger frequency ramping
- Added execution pipelines and functional units (eg, allowing it to issue more instructions per cycle)
- A "trace cache", to optimize the order in which instructions are fed into the pipelines (I admit, I could be screwing up this particular explanation)
- variable frequency units -- this one's a leap of faith, but tentatively according to some sources, different parts of the cpu will be clocked at different rates. A 2GHz Willamette chip might have a 2GHz integer unit, a 1.7GHz fpu, and a 1.5GHz SSE unit (mind you, this is a speculative example, with numbers picked out of the air)
- An improved motherboard bus, capable of 200MT/s (100MHz, double pumped) but also 128-bit, twice the width of the Athlon's EV-6, allowing for twice the peak bandwidth. Also, quadruple pumped (400MT/s) for the later server version, codenamed Foster. On the other hand, it will still be a shared bus, which is supposedly less "clean" than a point to point protocol that the Athlon uses, meaning that high-way SMP may get lots of collisions and degraded performance on Willamette.
The 130nm process basically will decrease the size of the features on the processor. Basically, imagine drawing stuff with a big fat marker, then getting a nice, fine pen. You can make much more detailed drawings, right? Basically same thing here. Benefits of going from 180nm to 130nm process:- Area of processor will be chopped in half, allowing for more than twice the amount of raw processor die to be fit onto one of those round fab wafers.
- Because defects increase exponentally with die area, yields per die will improve
- More on-die cache can be added with less risk of yield crash
- processor can use lower voltage and much lower power dissipation
- processor can be clocked much higher, generally a boost of 50-70% in the long run, but that's just my WAG.
-JC
PC News'n'Links
http://www.jc-news.com/pc -
Re:Can't wait. What will it be?
JCs News and Links has a compilation of rumors at http://www.jc-ne ws.com/pc/parse.cgi?processors/Transmeta/avant/st
a rt -
Why is it so much slower than DCypher CSC???
Aargh, this is annoying. I mean, I heard rumours that the beta client was a little slowish, but I just benchmarked the d.net CSC client on my machine, and I got about 380.4kkeys/sec cracking rate (it's a K6-2-300).
It there a problem with some systems, software-wise, or is there a bug in the d.net implementation of CSC? I tried out the DCypher.net CSC client (it's been out for a week or so, I think) and for 732.6kkeys/sec on the same system, under the same circumstances!
Actually, I checked and I'm finding similar comparisons from various people I know, with d.net's CSC client being about half as fast in cracking compared to DCypher.net's (a friend of mine tried both on an Athlon-650 and got 2,023,437keys/sec for DCypher and 1,040,189.47keys/sec for d.net, for example).
My testing is being done with all programs except the shell and systray (and a dos box, as I'm using the command line clients) closed and out of memory.
Is this an optimization issue? Will d.net release improved clients in a few days? I'm really getting worried and annoyed. I had planned to do a DCypher-CSC/d.net-CSC comparison on my website to show which was faster on which of a variety of cpu cores. But this is insane!
Oh ... ummm, I guess it's only fair that I link to DCypher, as they're kinda the underdog here and not as many people know about them.
-JC
PC News'n'Links
http://www.jc-news.com/pc -
JC's page...
JC (at http://www.jc-news.com/pc/) made a good point on his page the other day about this, I'll quote it here...
"Register put up a very interesting bit here. It's about a surprise Willamette introduction in February of 2000 ("paper launch" in December, chip actually appearing two months later, according to the article). I passed this by despite the fact that a good ten percent (slight exaggeration, but you get the idea) of y'all emailed the URL to me. It just doesn't seem likely, considering the design, to our collective knowledge, hasn't taped out (and if it did, it was likely recently). Takes about a year from tapeout to production. You do the math. However, as I said, I wasn't going to put up a link to it, but I just realized something (thanks to Jocelyn Fournier, I think, for nudging me in this direction). The specint95 score of the P7-1100 shown at that register article is utter crap. If it is really the case that it is that slow, then Willamette will be pretty pathetic for servers, especially if you consider the 1MB on-die L2. The quoted score is 43 at 1100MHz. By my guesstimations (with the help of idiot from Ace's), an Athlon at 1100MHz would score between 50 and 55 (perhaps subtract a point or two for dropoff from linearity), depending on whether or not you optimize for prefetching. This means that Athlon pastes these alleged Willamette scores in specint. Actually, from the look of it, given Intel's Coppermine presentation at PF, it seems that Coppermine is also faster than Willamette in specint. I didn't check at all with the Winstone score, but as you can see, if Register's data is true, then it isn't really great news for Intel. I don't know about you, but I'll prefer to believe the more reasonable assumption that Willamette will come out in 2000 Q4 (or 2001 Q1) but will be totally rippin' in performance." -
AMD's specific financial problems
Just a quick comment on that, AMD doesn't have a problem with finance management, at least not any more than your typical company. Their problems largely rooted in the following two factors:
1) AMD's sixth generation processor design was put together decently, but with a very shallow pipeline. This means that with your typical ramping schema, it should be at about the same MHz level as the Cyrix chips (300MHz) or the WinChips (250MHz). As it is, AMD has an immensely aggressive ramping team which has managed to bring AMD's K6 family to just under Intel's P6 family in MHz, which has a couple effects:
(a) Because the K6 family has been historically about two clock bins lower than the P6 family, and because Intel's pricing schema involves tremendous gulfs between the top two clock bins and all below it, AMD's cpu Average Selling Prices could not help but drop lower and lower as time progressed.
(b) Due to the K6's low pipeline and the fab team's uncomparable (and absolutely necessary) aggressiveness, the bin split of the K6 family parts are HORRENDOUS. Before AMD's recent jump to their cs44e7 hybrid process (quarter micron with some 180nm features), the top bin being produced was 475MHz and the bottom bin was still way down at 333MHz or so, with over half the parts still binning below 400MHz. This added more shame to their ASPs, as anything below 400MHz was under a hundred bucks, which means something like only fifty dollars profit per chip, at best.
(c) As a result of the aggressive ramping they needed (to compete with Intel's more easily rampable design), yields were kept lower than comparable Intel parts (though for the most part not horrendous, save for the little "incident" in February). This means that they get lower quantity to sell than they could have gotten otherwise, which means that, in addition to ASPs, they're making very low amounts of revenue.
2) There really is no way to get past problem 1a without making a newer cpu core with a deeper instruction pipeline. And to get past the problem in 1b, while that newer cpu core will help, it'd really be the wiser choice to expand your capacity, so AMD has forced themselves to spend a whopping, Intel-like amount of money (in R&D and in building a whole new megafab) so that, while they hurt in current quarters, they can thrive in future quarters. Would this strategy work? It's not guaranteed, but it's a hell of a lot cooler than the old "play it safe" mentality. If AMD had played it safe and not done all this fab or R&D stuff, then they'd have easily made profits (I believe) off the K6 series in every quarter of 1998 and 1999. The only problem is that they'd be lagging in clock speed at this point and they'd have no real future technology with which to compete. In effect, though they'd be profiting, they would be writing their own tombstone. The way they're doing it now, they've lost lots of money but they *finally* have superior technology to work with. Even without that newer fab, as soon as they ramp K7 to at least 60% capacity, they'd be making a pretty solid profit. With the newer fab, they'll be able to profit very nicely and retroactively fund these projects that they so unharmoniously dumped cash into all these years. They'd also be able to afford their future plans, which is a nice byproduct.
-JC
PC News'n'Links
http://www.jc-news.com/pc
PS: This stuff is largely my opinion, though I believe it to be largely based on fact. It isn't merely a pipe dream that leads me to believe that the K7 is the first design since the 486 that offers everything AMD needs to absolutely thrive in the market. -
Perhaps, how about the G4?
As has been pointed out, the "Pentium Toaster" ads only used the Bytemark benchmark, which is extraordinarily old and has very little relevance to the sorts of things CPUs do these days. For one thing, it includes no floating point tests at all, IIRC--and these days, most things the average user (i.e. no compiling) runs into that'll tax his/her CPU are floating point dependant. And furthermore, (also pointed out before), the MacOS hampers performance considerably. And if you want to do any sort of multitasking at all, it hampers it hilariously. Obscenely, even. Check out this article at the always impressive Ars Techinca for some ROFL confirmation. To be fair, this was benched before OS 8.6, which allows (gasp!) multithreading...but if I understand correctly, apps need to be rewritten to take advantage of it anyways.
As for real, cross-platform, general CPU benchmarks, there's pretty much only SPEC, limited as it is. Yes, to some degree it depends on issues of compilers, chipsets, RAM, etc. But it's good enough to at least be relevant.
Apparently, as far as SPEC95 goes, the G3 is about 14% faster/MHz than a P3 in SPECint, and about 9% slower/MHz than a P3 in SPECfp. Course, the G3 doesn't come anywhere near close to the P3 or K7 in MHz terms anyways.
And double of course, what really matters is app performance. Here, assuming one stays with the MacOS, we run into some serious problems. Essentially, ClarisWorks (now AppleWorks?) was way more optimized for Mac than PC (duh), and it showed. Photoshop is probably equally optimized for both--and no, contrary to what you've heard, it isn't necessarily faster on the Mac. Look a bit further in the Ars article above: turns out that while the Mac wins the Gaussian blur test w/64 megs RAM...it loses with 128 megs on a 100MB file...it loses on the lighting effects (FP intensive) tests...and, this is the big one, it takes 3 times longer to load the 100 MB TIFF in the first place. Woops. And as for, say, anything made by the Microsoft corporation, don't even bother: to say it's better optimized for PC is the understatement of the year. IIRC, MacOffice97 worked by just porting the relevant Win95 API's and keeping the program itself the same. MacOffice98 might be better...but not by too much.
Obviously, none of the above applies to K7 vs. P3 discussions--except, of course, for 3DNow(!) stuff, but by now most all video card drivers etc. are quite well optimized for 3DNow, and with AMD having the fastest chip on the market, that'll only improve. In any case, just read all the K7 reviews above, and you'll see that this thing doesn't just chew up the P3 in one or two CPU benchmarks...it whups it handily in just about everything. Synthetic benchmarks, Winstone, games, encoding, rendering, you name it. And this is before apps are optimized for it (new 3DNow instructions; 3-issue FPU unit, etc. all could benefit from optimizations).
[Note: from here on out, I'm pretty much talking out the ass of this page here on JC's News. Dunno how accurate it all is, but JC generally knows quite a bit about what he's talking about. And I've read some other stuff that backs him up.]
Now about the G4...well, it seems that the design goals of the G4 were to get SPECint 20 and SPECfp 20 @450MHz (I've heard this elsewhere, although I don't recall a mention of 450 specifically)--implying that it will run at around 450 on introduction. Now, the K7 at 600 beats both of those marks handily, and indeed if you assume, as JC does, that SPEC scales linearly (course it doesn't, but...) then the G4 is just a hair slower at SPECint (and exactly the same at 500MHz as the G3. Anyone else out there know if the G3 and G4 have exactly the same integer unit?), and a bit faster at SPECfp. Note that I'm not sure if he's using old guesses at the K7's SPEC marks, or real numbers...and I'm too lazy to figure it out right now.
Now, of course the target goals for the G4 were made back when they said it'd be coming out...well, by now. Instead it's going to ship in "Q3 1999"--where Q3 1999 is read, "January." So we can expect higher MHz on intro than 450.
Of course, by then, the Athlon'll be at 750 at least. Probably 850. Rumor has it 1GHz. We'll see. In any case, JC goes ahead and pits a projected G4-550 against a (projected?) K7-750...and guess what, the K7 is a hell of a lot faster.
On the *other* hand, the real wild card in all of this is the G4's AltiVec vector processing unit (for those who don't know, vector processing is the sort of thing Crays (used to?) use--very very good at many things that normal CPUs use floating point to do). On paper, it totally totally kicks ass. Like orders of magnitude faster than SSE/3DNow. And from what I hear, it'll be way easier to optimize for than SSE/3DNow, and waaaaaay easier than MMX (which required assembly programming)--i.e., it might just require a recompile with the "optimize for AltiVec" box checked.
On the other other hand, with the recent emphasis on nonupgradable machines (with comparitively poor 3D acceleration) in their consumer line, and a reported general lack of attention to gaming among Apple bigwigs (course, this was in some ZDnet story, so who knows if it's true), the amazing power of AltiVec might only end up being used in embedded DSP machines by Motorola and IBM.
On the fourth hand, if I had an iBook I could surf the internet while I was in the bathroom. Draw your own conclusions. -
check this impressive review site about K7
on this page there's a link to all known review, with rating, very cool!
--
http://www.beroute.tzo.com -
Athlon mobos not for general consumption yet
Possibly K7 mobos will be available in mid-late August for DIY channel. Check out JC's excellent K7 motherlode.
-
Re:When?
According to JC's PC News, SMP K7 boards should be available Q1 of 2000 (though I don't know if that'll be the 8-way boards). Also in his archives he has the K7 RC5 score using various cores (as there's no K7 core yet). It seems to get keyrates equivilent to a P6 core when using the P6 optimized code. Presumably the RC5 folks will make a K7 optimized version in the future though.
-
I doubt Intel will do this...
I think I agree more with this guy.
-
Re:Which is faster?
The simple answer: K7 (Athlon). Far and away.
The real figures are about 12% faster in SpecInt95, and about 50% faster in SpecFP95. That's comparing a P3-550 versus an Athlon-550. For the P3 Xeon-550, the Athlon beats it by about 5% in integer, and 40% in floating point.
If you want the full Athlon story, go to JC's page. More Athlon info than you ever wanted to know, including the above spec numbers and where they were obtained. -
K7 Spec numbers
JC over on JC's News managed to get spec marks on a K7-550 with 1/3 speed L2 (the shipping version will have 1/2 speed L2). Check out the numbers and other good stuff yourself, but it scores 25.1/22.5 on SPECint/fp. As you have noted, though, real world performance doesn't always scale with SPEC -- although it isn't totally out of touch with reality.
:) And yes, the microwave frequency is right up there around 1GHz ... I think the K7 will be in the 60W range, which is certainly better than my microwave (~800W, if I read the back of it correctly). I wouldn't run a bare processor on my desk and stare at it, though. -
Recording of the dinner presentation available
Posted by DanTucny:
A recording made of the dinner presentation on the K7 where these latest performance figures were announced is available from JC's ( www.jc-news.com/pc), it is in Real Audio format and has a few mirrors available...
Of course, you've probably already been there and heard it :) -
Re:K7 clock speed
The K7 will be released in 500, 550, and 600MHz variants. This has been heavily hinted since November, and was confirmed by the CEO of AMD (Jerry Sanders) himself at an annual shareholders meeting (I think that's when it was).
The L2 cache of the K7 will be a half the clock of the processor. The 1/3x MHz idea was put together because AMD wasn't certain that the SRAM market would be able to supply 300MHz SRAMS for the K7-600's L2. Thankfully, this is not a problem.
Incidentally, Kryotech's Super-G will be out this year, likely at 1GHz in Q4, with a hypercooled K7. It *will* be expensive, but it will be *worth it*. AMD will have two 180nm processes ready by Q4, which will make the K7 a lot cheaper to make and a lot more voluminous (eg: there will be more of them). Figure that you might see an 800MHz K7 by end of year if AMD deems it necessary, that's one great core for MHz!
-JC
PC News'n'Links
PS: K7 and mP6 look to be the fastest current cores for rc5, per MHz. They may both be faster than the mighty K5, once optimized for. -
Re:PRICE???
The K7 is going to be priced comparatively to the Pentium III, not the Pentium III Xeon, from what I've been told. The estimates among my local group are:
$400 or slightly above for 500MHz
$550-ish for 550MHz
$700 or so for the 600MHz version, though they may want a more respectable (eg: high) premium for the fastest x86 process of all time
These prices are slightly higher, mostly, than our extrapolations of PIII pricing around late July, where K7 will start to pick up volume. Despite the performance delta, AMD will likely make the part available to high end consumers in pricing, plus they want to pummel down Intel's high end ASP so they choke on their own Celerons.
AMD's DDR L2 "Viper" version of the K7, in Slot-B, will compete against Xeon. It will also happen to destroy Xeon in spec -- even more utterly than regular K7 does. Cascades looks like it'll be toasted a bit, too, unless Intel puts up a surprise and gives it 1MB L1 on-die.
BTW: K7's integer score beats out HP's mighty PA-8500 (which has 1.5MB L1 on die), I'm told. It may be the 2nd or third highest specint95 core out there.
Also, K7 kicks ass at rc5 -- pass it along!
-JC
PC News'n'Links -
Intel and Rambus and the K7Nice post. A few points
The point is that other companies aren't doing what intel is telling them to. And intel really doesn't like that.
Indeed. Add to this that Intel has a large stake in Rambus and that every single Rambus module sold will result in a royalty payment to Rambus and you can see why they don't want PC133 and PC266 to succeed.
The K7 may have a 200 mhz bus between the chipset and the processor
According to this great article on K7 rumours Slot A will be able to run up to about 250MHz, but Slot B will go up to 400MHz. Yum yum!
VIA made K7 chipsets [will support PC133]
Here in time for Christmas by the looks of things. This sabre rattling by Intel might even make takeup faster. If people are worried about whether VIA has the rights to the GTL+ bus VIA might advise them to use the EV6 stuff for the 21264 and K7 instead. If only AMD would second-source the K7 so people could really believe that supplies will be reliable. You don't piss off Intel unless you are very sure you won't have to come crawling back. Actually I did see some rumours of a second source for the K7. IBM and Samsung would be the obvious candidates.
While future K7 chipsets will support RAMBUS
It would be ironic if high end K7 chipsets were delayed because they decided to invest a lot of effort getting Rambus to work, and then the RAM modules don't turn up. I think for the high end, with huge 2nd level caches and enormous bandwidth requirements Rambus may have the edge if the caches take the top off the latency problems, and AMD may have thought the same way. And who would have guessed that an Intel-sponsored technology could fail in the PC space?
-
Slashdotted? Try this.
Looks slashdotted already. If you like me can't get through, try JC's page for more K7 info and rumours.