aceshardware.com · Domains · Slashdot Mirror

My quiet case project : it's an answer ... sort of by Prozzaks · 2001-06-04 12:50 · Score: 1 · on Building Quieter Computers

Well, it seem these days, most of the power user just care to get something like 200fps in Quake III. Why ? Beat's me ! I'm not on a quest to get the ultimate frame rate, I just want my box to be quiet as possibly can be.

To help you understand my take on the subject, here is the background :
My PC has the following components :

A OEM case
A 235W OEM power supply
ASUS P3B-F
Intel Pentium II rated 400Mhz @ 400Mhz
A cheap OEM SECC2 Heat-Sink made of aluminum
A 128MB CAS2 no-name DIMM
Two 32MB CAS3 Samsung DIMM slowing down my memory timing, but preventing the appearance of the all mighty evil SwaP
A ATI All-In-Wonder Rage128 16MB
A Creative SoundBlaster Live! Value
A Realtek 8139 Ethernet NIC
My beloved USR 56Kbps ISA Real Modem. Sorry but to me a component that uses CPU power to do it's processing instead of taking the load off is not worthy of being in my computer. Not to mention the M$ Win part...
A Creative 48x CD-ROM drive. It's the loudest damned thing in my computer when it's spinning
A Quantum Fireball AS PLUS 40GB (7200RPM) in a removable tray
A Quantum Fireball CX1 10GB (5400RPM) mounted inside the case
Of course the stupid old 1.44 MB floppy drive only used for booting Tomsbrt in case of emergency

Soon to be :

A Adaptec 2940UW
A Diamond Monster 3D II for Glide games

It turn out that the Quantum Fireball AS makes less noise than the Quantum Fireball CX1. I still have to figure it out ...

I use my PC for :

Running Linux and learning as much as time allows me (Jez I had so much time when I was a student... Think of all the time I wasted in High-School running the evil W monster)
Doing some gaming i.e. : Diablo II, Unreal, UT, Undying (Although that thing is going to cost me a new box)
Spending numerous nights filling my brain @ Slashdot, Tomshardware, Anandtech, Arstechnica, StorageReview, Developper.Intel.com, and most importantly, hounding the web for all the case manufacturers and their take at a quiet box.

As I'm writing this post, that is probably going to be the base documentation for my Silent Case Project, you're guessing that my sleepless night of browsing have not yielded the desired result.

I've check out many options such as water cooling, moving the PC to the closet, returning to the forest where a PC is pretty far from your everyday quest for survival. None of them suits me.

The objective of my project is to build a case that meets the following criteria :

A silent as possible
Accessible
Provides sufficient ventilation to maintain all the components running within thermal specs
Be light enough to be easily transportable (Let's not forget the Lan parties ;-)

To attain those goals I have to :

Read all I can about noise, sound, aerodynamics, PC specs
Find suitable materials : A case is not just a protection against unwanted fingers and dust ; it must provide EMI shielding, proper grounding, resist to impacts, and fit into my conception of the king of object you want in your bedroom (If you were thinking about plywood and a box of rusted leftover nails, forget it)
Find the tools or the companies or individuals with the means to work the materials I choose to build the casing

For the sound isolation I was thinking about some kind of foam. Mineral lint would be affective but that takes too much space and it's not the kind of thing I want beside my bed. Form the casing itself, metal is almost inevitable if you want EMI shielding and grounding. And as for you who wonder why I have not mentioned water cooling yet, the greatest source of noise is not my CPU cooler and your just moving the problem out of the case (Nice ; you have water heating up but unless your reservoir is like a bathtub or something you will have to transfer the heat for the water to the air).

That about as far as I am. If you have any idea that might help me, please fell free to send me some bits forming ASCII characters at Prozzaks@operamail.com

To finish up, here is a list of thing that might help people wanting to achieve similar goals :

http://www.formfactors.org/ You should be able to find all the documents regarding the ATX form factor and thermal design guides. A must if you want to build a quiet PC.
http://developer.intel.com/ Intel has contributed a great deal to the ATX definition ; here you will find many relevant documents including thermal design guides for all Intel processors.
Etract from my favorite's :

Hardware\cases PC CASE
Fong Kai
PowerOn
Enlight Corporation
dir.yahoo Enclosures Manufacturers
procase
YY Computer
Psi
IN WIN
Amtrade
American Suntek
Addtronics
A-Top Technology, Inc
Nikao
Palo Alto Products
Antec
Lian-Li
amaquest
Koolance
Quietpc
PC Power & Cooling

Hardware\Heat Sinks ALPHA
Cooler Master
AVC
ekl
GlobalWIN
globefan
RDJD
Foxconn
Spring Spread
Sanyo Denki
TITAN
TaiSol
ChipCoolers
Orb a
ElanVital

Hardware\Info\Form Factor Platform Development Support
SSI
WTX

Hardware\Info\Standards Fibre Channel Industry Association
PCI SIG
RAB
serialata
SPEC

Hardware\Info\Storage RAID.edu

Hardware\Info\Cours CS 252 - Graduate Computer Architecture

Hardware\Info The PC Guide!
Hardware Bible
FullOn3D
developer.intel.com
HwB The Hardware Book
United Overclockers
Ars Technica
Tech-Junkie
HardwarePub
Webopedia
Illustrated Guide to the PC Hardware
SysOpt
2CPU
Ace's Hardware
Technical Support - RaidHelp v1.0 - Free RAID Technology Guide
Computer Architecture
OPENCORES.ORG
TechFest
MidWest Micro Support

Hardware\Resalers GeekTek!
Micro-Bytes
ALCO
ABC Micro
2CoolTek
Plycon Computers
TCWO
ABC Micro - Lprix
Case Outlet
The Chip Merchant, Inc
Cimsys
OrdiGros
ALIENWARE
SHENTECH
FireStorm
Hyper Microsystems
TWEAKBOX

Hardware\Reviews Tom's Hardware Guide
Sharky Extreme
StorageReview
HardOCP
AnandTech
SystemLogic
x-bit labs
Active-Hardware
FiringSquad
SocketA
Overclockers Australia
HEXUS
dansdata
SysReview

Hardware\Manufacturers AMD
ASUS
Belkin
MassMultiples
Promise
StarTech
VIA Technologies, Inc
ABIT Computer Corp
Comcase
Micron Semiconductor
ECS

Hardware Freeboxen

Re:Benchmarks? by Sketch · 2001-05-29 01:13 · Score: 1 · on SGI 750 Itanium Server

There are some SPEC benchmarks and commentary up on aceshardware.com.

Interesting that Intel appears to have finally released a CPU with good (great, even) fp performance. Too bad it sucks for integer...

OpenVerse Visual Chat: http://openverse.org

More coverage... by jeffsenter · 2001-05-13 21:49 · Score: 5 · on AnandTech Peeks At The Athlon 4

Ace's Hardware has a nice summary and set of links for the Athlon 4.

Unfortunately Sharkyextreme and HardOCP do not have reviews of the chip up for comparison yet.

Tom's does have a review up.

speed by Anonymous Coward · 2001-04-27 01:08 · Score: 3 · on Clawhammer to be 1/2 size of P4

Duron is currently 100mm (al) while Athlon is 120mm (cu).. That the Clawhammer is 105mm doesn't seem that impressive given the shrink from .18 to .13 SOI. Especially since the PentiumIV will be down to 116mm (rumor: and possibly have an upgrade from 256K L2 to 512K L2 cache!) The fact that the Athlon is currently half the size of the PentiumIV and is overall slightly faster is (was) news worthy. The fact that the clawhammer is going to only be 91% of the size of the PentiumIV isn't really a big deal. Just because the PentiumIV takes 10% more die space isn't going to add up to much, especially since Intel has much more than 10% more fab space to throw at it. According to Ace's Hardware's 2001 shareholder's meeting coverage the Clawhammer will be introduced 2nd half of 2002. Anybody want to guess at what "But it will deliver more than three times the clock speed of the first Athlon..." means? According to sandpile the first Athlon was introduced on a .25 process at 500, 550, and 600Mhz Jun 23rd 1999, but went up to 700Mhz by Oct 4th before switching to a .18 micron process. I guess this means the clawhammer, with its .13micro SOI manufacturing and architectural enhancements over a standard K7, will reach atleast 1.5Ghz 2H 2002? I should hope so! 2Ghz seems like a better low end guess for clock speed, but it'll probably be closer to 2.5Ghz than 2..

Re:Java vs C++ by Glock27 · 2001-04-24 03:11 · Score: 1 · on Next Generation C++ In The Works

Java does have some things going for it. It's a neat little platform for applets and set-top boxes. But it is vastly overused in settings for which it is painfully unsuited (like Web servers). And certain pathetic hacks to get around the confinements of Java (such as using "interfaces" to pretend for a second that Java can almost simulate multiple inheritance, but wretchedly) just cause more spaghetti code, pain, and suffering.

Java is pretty well suited to large scale programming. Lack of multiple inheritance is not a big deal in practice...heck Smalltalk (touted as the end-all OO language by its aficionados) doesn't even have interfaces, only straight single inheritance.

Current Java implementations are very competitive with (and often better) than C++ performance. See the "Binaries vs. Bytecodes" article at Ace's Hardware for one example. Yes, Java uses a lot of memory...but memory is CHEAP! Why do we really need 64 bit desktop chips? ;-)

Also don't forget that gcc 3.0 will have a traditional (ahead of time) Java compiler for those memory-tight embedded applications.

In the meantime, my company is distributing several scientific Java apps, and supporting them on Windows, Linux and soon MacOS X. Sweeeeet!

186,282 mi/h...not just a good idea, its the law!

Reduced operatings temps, reduced voltage ... by JoeGee · 2001-03-25 04:17 · Score: 1 · on AMD focuses efforts on Palomino core

I am interested in reading about the technolgies AMD is using to reduce the thermal output of these chips. I have heard that a special (more pure, or possibly a different isotope) form of silicon is used. The core will also run at a much lower voltage.

Over on Aces' Hardware there's a story that passive cooling may be all that is needed for Palomino chips running at 1.5 GHz. Is it hype, is it truth? It would certainly be nice to be able to upgrade from my 1.0 GHz space heater to a 1.5 GHz cup warmer. :)

From what I have seen on the various tech sites it looks like Palomino will put AMD on near parity with Intel in regards to raw MHz. As the current Athlon core demonstrates clockspeed is less relevent than efficiency, but for sheer bragging rights on the desktop clock for clock even the current Athlon core *greatly* exceeds the performance of any similarly clocked Intel chip used for comparison. If Palomino has enhanced cache performance or better branch prediction it will humble Intel's best even further.

I will be interested in seeing what kind of speeds Palomino can reach on laptops.

I do not look for the P-4 to be competitive with the Athlon until it undergoes a die shrink to .13 micron to make room for more cache, and gains another FPU (as was originally planned.) I think Intel can look forward to a summer of humiliation until they get the die shrunk P-4 out to their OEMs.

And then six months later according to all indicators Sledgehammer/Clawhammer will jump up and down on P-4 and smash it into tiny little twitching bits right before it sinks its teeth into Itanium. There's no rest for the wicked. :)

Reduced operatings temps, reduced voltage ... by JoeGee · 2001-03-25 04:17 · Score: 1 · on AMD focuses efforts on Palomino core

I am interested in reading about the technolgies AMD is using to reduce the thermal output of these chips. I have heard that a special (more pure, or possibly a different isotope) form of silicon is used. The core will also run at a much lower voltage.

Over on Aces' Hardware there's a story that passive cooling may be all that is needed for Palomino chips running at 1.5 GHz. Is it hype, is it truth? It would certainly be nice to be able to upgrade from my 1.0 GHz space heater to a 1.5 GHz cup warmer. :)

From what I have seen on the various tech sites it looks like Palomino will put AMD on near parity with Intel in regards to raw MHz. As the current Athlon core demonstrates clockspeed is less relevent than efficiency, but for sheer bragging rights on the desktop clock for clock even the current Athlon core *greatly* exceeds the performance of any similarly clocked Intel chip used for comparison. If Palomino has enhanced cache performance or better branch prediction it will humble Intel's best even further.

I will be interested in seeing what kind of speeds Palomino can reach on laptops.

I do not look for the P-4 to be competitive with the Athlon until it undergoes a die shrink to .13 micron to make room for more cache, and gains another FPU (as was originally planned.) I think Intel can look forward to a summer of humiliation until they get the die shrunk P-4 out to their OEMs.

And then six months later according to all indicators Sledgehammer/Clawhammer will jump up and down on P-4 and smash it into tiny little twitching bits right before it sinks its teeth into Itanium. There's no rest for the wicked. :)

2 worst reviews by ruiner5000 · 2001-03-22 03:35 · Score: 3 · on AMD Challenges P4 With 1.33Ghz

Wow, the two worst reviews from the Intel biased sites get posted. Surprise surprise. Here are a lot better reviews from sites that have not sold out.:)

AMDZone
Gamer's Depot
Ace's Hardware
GotApex?.

And here is a presentation with benchmarks and a roadmap. Have fun. Don't let biased slashdot postings warp your mind!

Re:Don't do it in Java by gchanot · 2001-03-15 22:02 · Score: 2 · on The Fastest Web Language On The 'Net?

OK, let me try again!

Java vs C

Java 3D applet

Switches invalidate the results (also: 4-way SMP) by NortonDC · 2001-02-01 13:57 · Score: 3 · on Dual Athlon Preview: Linux Kernel Compile Smokes

The "-j3" switch with the make is why it got a greater-than-linear improvement.

See Ace's Hardware for a discussion of exactly this:

"[T]he dual-processor test was performed with not just two, but, in fact, three make processes. The difference here is that a processor will not be completely idle while waiting on IO in the second test, as there are two additional build processes running concurrently. This is why the use of the -j parameter is often recommended even for uniprocessor systems, as a parallel make will often yield much higher CPU utilization and thus faster compiles."

Also, see reader comments saying that AMD demonstrated a 4-way SMP Athlon system at LinuxWorld.

Switches invalidate the results (also: 4-way SMP) by NortonDC · 2001-02-01 13:57 · Score: 3 · on Dual Athlon Preview: Linux Kernel Compile Smokes

The "-j3" switch with the make is why it got a greater-than-linear improvement.

See Ace's Hardware for a discussion of exactly this:

"[T]he dual-processor test was performed with not just two, but, in fact, three make processes. The difference here is that a processor will not be completely idle while waiting on IO in the second test, as there are two additional build processes running concurrently. This is why the use of the -j parameter is often recommended even for uniprocessor systems, as a parallel make will often yield much higher CPU utilization and thus faster compiles."

Also, see reader comments saying that AMD demonstrated a 4-way SMP Athlon system at LinuxWorld.

Ace's Hardware by NovaX · 2001-02-01 13:46 · Score: 5 · on Dual Athlon Preview: Linux Kernel Compile Smokes

There's a better news bite at Ace's about this. Basically, the second compilation used 3 threads, so the CPU may have had less idle time and i/o bottle neck then the single.

"Unfortunately, the benchmarks vary significantly between the two tests in that the first is completely serialized while the second (dual-processor) test is run with three parallel make processes (notice the -j flag). Because the first system is running with only a single build instance, the processor is spending a great deal of time simply waiting on IO. Meanwhile, the dual-processor test was performed with not just two, but, in fact, three make processes. The difference here is that a processor will not be completely idle while waiting on IO in the second test, as there are two additional build processes running concurrently. This is why the use of the -j parameter is often recommended even for uniprocessor systems, as a parallel make will often yield much higher CPU utilization and thus faster compiles.

"Until then, it is very difficult to make a representative statement about the performance of a dual-processor Athlon system from this benchmark."

-----------------------------------------

Re:can you guys PLEASE get some REAL information? by Namarrgon · 2000-12-29 02:03 · Score: 1 · on The Pentium IV Dissected

I have a 1.6GHz P4 computer system (prerelease, not overclocked) and a IA64 system here at work, and as *Tom's Hardware Guide* clearly points out here in their *latest* comparison...

Can't say I entirely trust Tom's these days. He's inclined to be a little slipshod and biased, IMHO. Look how much he raved about DivX ;-) - "as good as DVD", indeed.

If you'll look at the very next page from that link you posted, the P4 gets roundly beaten on the next test, and further, Tom tells us that BapCO (who wrote that particular benchmark) actually have their offices inside Intel's Santa Clara building! Guess which CPUs they optimised for?

The truth is, the P4 has strong areas and weak areas. In RAM-bandwidth and SIMD-optimised benchmarks, it flies very nicely. In straight FP and complex decision code, it suffers, sometimes very badly. I saw a benchmark, SuperPI, where a 1.5 GHz P4 performed at less than half the speed of a 1.2 GHz Athlon.

In fact, forget this article. Read the one at Ace's Hardware instead - it's much more informed.

Namarrgon

Re:can you guys PLEASE get some REAL information? by Namarrgon · 2000-12-29 02:03 · Score: 1 · on The Pentium IV Dissected

I have a 1.6GHz P4 computer system (prerelease, not overclocked) and a IA64 system here at work, and as *Tom's Hardware Guide* clearly points out here in their *latest* comparison...

Can't say I entirely trust Tom's these days. He's inclined to be a little slipshod and biased, IMHO. Look how much he raved about DivX ;-) - "as good as DVD", indeed.

If you'll look at the very next page from that link you posted, the P4 gets roundly beaten on the next test, and further, Tom tells us that BapCO (who wrote that particular benchmark) actually have their offices inside Intel's Santa Clara building! Guess which CPUs they optimised for?

The truth is, the P4 has strong areas and weak areas. In RAM-bandwidth and SIMD-optimised benchmarks, it flies very nicely. In straight FP and complex decision code, it suffers, sometimes very badly. I saw a benchmark, SuperPI, where a 1.5 GHz P4 performed at less than half the speed of a 1.2 GHz Athlon.

In fact, forget this article. Read the one at Ace's Hardware instead - it's much more informed.

Namarrgon

Re:The P4 is the world's fastest microprocessor. by ToLu+the+Happy+Furby · 2000-12-15 07:01 · Score: 2 · on P4 - The Art Of Compromise

This has been said often enough for so many different processors that it has become trite. From experience, extra bits of compiler optimization rarely pay off in a big way. Quite often, it is impossible to tell the difference between minimal and full optimization settings. I suspect that contrived examples are being used for benchmarks, such as an image filter that takes 10 seconds to run and spends all its time inside of a 16 instruction loop. Sure, one tweak to the scheduler will make it run in 8 minutes instead, but how realistic is this? It isn't a win in the general case.

That's why I was talking about SPEC_CPU, the most comprehensive and well balanced CPU benchmark suite on the planet, and not some crappy toy benchmark. Indeed, the P4 does very well on recompiled toy benchmarks as well, but I didn't mention them because they don't tell us anything useful.

FYI, SPEC_CPU is about as far from some "image filter that takes 10 seconds to run and spends all its time inside of a 16 instruction loop" as one can get. Indeed, it is a suite consisting of no less than 28 benchmarks, each designed to stress different algorithmic and data set size combinations, and each very non-trivial. It is the industry's only truly cross-platform benchmark, and it is designed and revised every few years by a committee consisting of some of the foremost experts on high-performance and scientific computing, and advised by every significant MPU vendor to assure fairness. It does not, as you imply, allow any hand-tweaking of assembly code, nor--like most benchmarks--does it come in the form of precompiled binaries which may favor one platform over another. Instead, it comes completely as source code, to be compiled by a vendor supplied compiler--which must be publicly available within a certain time frame--under very specific regulations. The "base" and "peak" categories refer to different levels of allowable customization in the compiler settings, and indeed all compiler flags used must be revealed along with the results. And rather than taking 10 seconds, a full SPEC_CPU run takes a couple hours even on a P4 or high-end Alpha; on the reference machine (i.e. a SPEC_CPU2000 score of 100) it would take something like 12 hours!

So, nice try. But trust me, the only way to beat SPEC_CPU is to built a really fast CPU. It also helps to have an amazing compiler--which Intel does with its VTune 5.0 compilers--but that allows nowhere near the potential for unfair binaries that precompiled benchmarks do. Also, being aimed at the high-performance market rather than the PC market, SPECfp2000 has been criticized by some as "unfairly" rewarding the very large memory bandwidth of the P4 compared to the P3 and SDR SDRAM Athlon. For an IMO interesting technical discussion of this issue, you might want to see this thread over at Ace's Hardware. (See if you can guess who I am. :)

Re:Why I love Java and why I hate java by harmonica · 2000-12-01 08:19 · Score: 2 · on Why Linux Lovers Jilt Java

>Swing sucks. AWT sucks. GTK+ should have Java hooks (now that Sun is a GNOME backer.) I cannot get an App that doesn't look like ass.

Get some decent Swing apps like Gnutella or jEdit. Use a decent VM. They look great.
And they look great on Win32, Linux, Solaris, Irix, HP-UX, Tru64 Unix, OS/2, Macintosh, some IBM OS's I never heard of, you name it.

>Slugishness. Java gets its ass kicked by C in speed.

No, it doesn't. Read the article Binaries vs Byte-Codes.

On the other points that you mentioned, some other replies dealt with them or I don't have a quick link at hand. You're generalizing a lot. Redundancy in Integer.parseInt - that's a problem? It's a class method, and there are class methods for other types in java.lang.Integer. No redundancy here.

Crunching is not really much better in JNI by binkless · 2000-11-30 22:58 · Score: 1 · on Why Linux Lovers Jilt Java

Check this out. FFTs in java!

Voodoo5 won't work in a P4 board! by AFCArchvile · 2000-11-29 22:31 · Score: 2 · on Pentium 4 Systems Recalled By Some U.S. Stores

Check it out at Ace's Hardware. The reviewer wanted to test the system with a non-T&L card, but he couldn't insert the V5 into the AGP slot! There's two notches in the standard AGP connector design, and the V5 is missing one, and it just so happens that the AGP slot on the i850 board has a notch which aligns with the missing notch. Therefore, either 3dfx is in a big bind, or V5 0w|\|3Rs everywhere will whip out a Dremel and saw the notch themselves!

Benchmarks miss the point! by My+Third+Account · 2000-11-23 12:02 · Score: 3 · on Tom's Hardware Retracts P4 Endorsement

The FPU in the P4 is there for x86 compatability. Intel is betting that software developers will use some of the P4's 144 new instructions to accomplish floating point operations. The new instructions, if used properly, could realize significant speed increases.

Further, as was mentioned earlier, Intel has always released new generations of CPUs that didn't exactly take the benchmark world by storm. Wait 'till software emerges that takes advantage of what the P4 has to offer. Then you can try to complain.

Didn't anyone read what Paul DeMone had to say? Or Ace's review of the P4?

Stream Benchmark ought to silence Rambus skeptics by VAXman · 2000-11-19 23:39 · Score: 1 · on Intel RoadMap with P4 Stats To Boot

Available here.

Wow. 2-3x as fast as Athlon's and P3's running DDR or SDR.

More P4 Reviews by bliksem9 · 2000-11-19 22:16 · Score: 1 · on Intel RoadMap with P4 Stats To Boot

Ace's hardware Pentium 4 review
Linpack, a buttload of different game engines (11 or so), povray, truespace and C++ compiling

Hardocp
sisoft, q3, ZD benchmarks

Re:I'm sorry... by B1ood · 2000-11-09 02:48 · Score: 3 · on Sun's (un)official response to .NET

But you can't outrun native code, no matter how good your universal language is.

That is usually the case, very true, but cases do exist where interpreted code (or byte compiled interpretation like java) has performed just as good or better. This article illustrates that better than I can say it in a reasonable amount of words. It all comes down to just how parallel the java instructions in the bytecode are to the native instruction set of the underlying processor, and the ability of the jvm to remove its own overhead.

B1ood

Re:You have way too much time on your hands, frien by ToLu+the+Happy+Furby · 2000-10-24 06:02 · Score: 2 · on AMD vs Intel: CPU Design Philosophy

First off, apologies for slipping in ad hominem attacks in my post. However, this was just in response to your similarly inappropriate attacks on Johan and Ace's. The difference, of course, is that my comments were in support of the correct analysis, not disparaging it.

This is not a site on which every little review and rant is meant to be posted. Hannibal's article belonged here because, well, Hannibal is an expert on the technology behind microprocessors.

1) It was Hemos' decision to post this; anyone can submit anything they deem worthwhile.

2) This was neither review or rant, but rather a lengthy and insightful look at some subtle but very important issues that will influence P4 vs. Mustang performance. Just because you've never seen anything on the web supportive of the P4 doesn't make a balanced piece a rant; it just means that you've been reading a lot of ignorant writing.

3) Humorously enough, the "self-promot ing" Hannibal link I offered was exactly "every little review", this time of some gimmicky portable (but monitor-less) PC. I found it entertaining, and was happy to see it on /., but it was the very definition of a fluff piece--like much of /., now that you mention it.

4) Hannibal is NOT an MPU expert. He himself will acknowlege this, and has in his articles (don't have time to find where). Email him yourself and ask him who is more of an expert, himself or Johan De Gelas, and I am relatively certain he'll say Johan. If not, he will readily admit that Johan is at least his equal and that Ace's is a much more technical site than Ars. And he will most certainly admit that Paul DeMone is 10 times the expert he is. Again, I really really like Hannibal's work, it honestly inspired me, and I submit every new Hannibal-on-architecture article to /. But he is just a student, not an expert.

And, just FYI, I have read every single article on microproccessor design that has passed by the /. pages for two years, plus linkage from several other sites, and a few print articles (though I no longer like to touch paper. How primitive...). I could easily look up links, hell, just by using the search features on /. and Anand.

ROFLMAO!

You read the scant handful of poorly chosen architecture articles linked from slashdot and you consider yourself an expert??? HAHAHAHAHAHA. Oh--and sometimes you check your facts with little old 16-year old Anand.

Look dude, it isn't my place to criticize you for not knowing as much about MPU design as I do. It is my place to criticize you for not realizing that there is much more to be known, for not realizing that many people do know more about it than you. I am certainly no expert--I'm just a college student--but it is blindingly clear that I know more about it than you, just as it is clear to most /.ers that they know more about computers than, say, the guy who says he needs to go out and buy more "RAMs" because the new game he just bought says it requires 250MB of free space to install. (Don't worry--you're not that bad, it was just an analogy. ;)

Second it's quite clear that you essentially skipped all the parts of the article you didn't understand and concluded that if you--with your expert education on MPU design from /. and Anandtech--didn't know what was going on, it must be "IYNSHO, fluff masquerading as technical writing". Unfortunately, your opinion, humble or no, does not apply here: it is indeed a fact that this piece contained several new insights, and synthesized information which was not easily available in other forms. This may not meet your standards of being "more than a book report," but it is certainly meets those of technical writing. Obviously Johan could not hope to benchmark the new P4 or Mustang cores, as they are not released yet; still he managed to include some insightful benchmarks which demonstrate the points he was abstractly discussing with ample clarity. (Of course, if you're used to looking at MPUs as mysterious black boxes, then you might wonder what rehashed K6 benchmarks are doing in a Mustang/P4 article.) If you truly believe that this article included "nothing unique", why don't you post just one article detailing the issues I raised in my previous post? Since you've obviously read such an article yourself, MPU expert that you are, it shouldn't be too difficult to dig up a link, even without resorting to "the search features on Anand." (LOL!)

No, Johan didn't take what might be called the "Hannibal route"--i.e. launch into an exploration of the overall design philosophies behind the two cores--because he is writing for a specific audience, a knowledgeable technical audience who can be expected to have read several pieces explaining the important design features of the P4 (not much concrete is known about the Mustang other than that it will be a K7 with tweaked layout to improve critical path and power consumption, and that it may recieve several other enhancements as speculated in the article), specifically those here, here, here, here, and here. Not only have most regular readers of Ace's read all these articles, but they have followed some very interesting debates on them between industry experts on the Ace's tech forum for months now. It might be fair to criticize Johan for submitting an article which clearly assumed such a technical background to /. (although in fairness he includes a link to his earlier, more general P4 article in the very first sentence); of course, it's /. who decides what to post on their own site, not Johan.

Re: preproduction benchmarks, the Tom's piece on the PII and the Firingsquad piece on the K7 were generally the only benchmarks available of the respective chips before their launch. If you followed MPU design news as closely as I do you would know this. There is a thing called an NDA, after all; as these two pieces demonstrate, both Intel and AMD like to make sure that those who choose to break theirs post erroneous information.

You're of course right that a PPro was indeed superior to a PII at a given clock speed; if you look through the article itself instead of just relying on the concluding quotes I posted, you would find benchmarks which clearly understate the known performance of the PII by as much as 30 or 40%, though. There is no doubt Tom's preproduction benchmarks, like Firingsquad's, were horribly off. And as long as you're disputing my "always" contention--I've ponied up the links (and no, it didn't take me very long at all, because I, having followed MPU developments for a couple years, knew for example that it was FS with the bad preproduction K7 benchmarks, and Tom with the PII controversy); why don't you post a single pre-NDA "review" or even just a series of leaked benchmarks on a new x86 core which proved entirely accurate?

Re: definition of a 7th-gen core: You really should pay more attention in class, boy, because I'm schooling you right now. I explained what I meant right after that comment in the original post: The Athlon is a new core designed to scale well to very high clockspeeds. Just like the Willammette. That's why the Willamette performs slower clock-for-clock than a theoreticl P!!! at the same clockspeed

First off, there is no evidence that the P4 has lower IPC than the P3, except for preproduction benchmarks and some ambiguous comments from Intel VPs. If you read my previous post at all, you would realize these would tend to indicate that the P4 actually has higher IPC, not lower. On the other hand, the main evidence that it has higher IPC is that an analysis of all the new, innovative, braniac features of the core strongly indicate that it must.

And second off, you couldn't be more wrong. By calling the K7 a "7th-gen core" you are obviously comparing it to the 6 previous generations of Intel cores. Each of them was able to improve both clock speed on identical process and IPC significantly over the previous generation. The Athlon beats the P6 in clock speed on identical process...but only narrowly: the Athlon sweet spot right now is around 1 GHz on Dresden's .18 Cu process; the P6 around 750 MHz on a .18 Al process. Intel's process is probably slightly better except for the large Cu vs. Al gap, so we can be charitable and say that, on identical processes, the K7 clocks 25% faster in an untweaked core than the P6 does in a much-tweaked core. Indications are that the Mustang/Palomino/Morgan K7 tweak will reach 1.5 GHz on .18 Cu, so perhaps 35-40% better on equivalent processes. As for IPC, the P6 and K7 are essentially equal. Indeed, this is being generous to the K7, as the P6 knocks it all over the place in the fairest cross-platform bench there is, SPEC. Yes, this is because Intel's in-house compiler group is better than AMD's...but the compiler is arguably just as much a part of a core as the silicon itself.

Meanwhile, P4 roadmaps indicate that it will scale 100% better than the P6 on identical processes, and the analysis of Paul DeMone, a far greater MPU expert than you or I could hope to be, is that it will have 15-20% better IPC for integer work, and considerably greater gains for FP. (It's too soon to tell without knowing more about how well compilers will optimize for SSE2.) That would be a 7th-gen core worthy of the leap from 5th to 6th which the P6 provided.

Again, don't get me wrong: the Athlon is clearly clearly superior to the Coppermine P3. But only by about the same degree as the Coppermine P3 was superior to the Katmai P3. That is, *not* by a full "generation"--whatever the hell that is.

Re: important innovations in the K7: It's a huge innovation in the x86 world, something Intel hasn't done since the PPro days. First of all, the EV6 bus is new to x86 and a huge innovation, it is superior in every way to the old Intel GTL+.

BWAA HAA HAA HAA HAA HAA!! Man, I'm rolling on the floor and crying that's so pitiful.

Oh, but I'm being rude. Ahem. Pardon me. You, uh...you do know why it's called the EV6 bus, don't you? ...Even though that name happens to be shared by the current generation of Compaq Alpha MPUs... Or wait; actually the official name of the current Alphas chips is 21264; it's just that they like to code name their core variants things like "EV6" and "EV67" and the upcoming "EV68". Based on what, again? What's that? Based on the code name of the current Alpha platform???

HAA HAA HAA HAA HAA! I asked you to pick one innovative feature of the K7, and you picked the one feature that AMD DIDN'T INVENT!!!!!

Ok, I'm over it now. Phew.

Right. AMD didn't invent the EV6 bus. They didn't help develop it. They in fact had nothing to do with it. They licensed it, wholesale, lock, stock, barrel, from Compaq where it has been in use for quite some time now. On the one hand, it was a good business decision because Intel had just clamped down and decided not to relicense the P6 bus (not really called GTL+ BTW, but don't worry it's a very common mistake) to AMD, and rather than take the time to reinvent the wheel (and thus delay the launch of the K7), AMD decided to go shopping at Compaq. Fine. Smart decision.

Don't give me any of this revisionist history that they did it because it's 200 MHz, though. The K7's extra FSB bandwidth (courtesy of the EV6 bus (and the engineers at Compaq, not AMD)) has up till now been entirely wasted as it is paired with SDR SDRAM (1.6 GB/s FSB, only 1.06 GB/s from DRAM)--generally paired asynchronously with PC133. If it were any help at all, don't you think the Athlon would be winning and not losing in FSB-intensive benchmarks like Q3? Meanwhile, it's a huge waste of pins and power--as well it should be, since it was originally designed for $10,000-50,000 workstations and servers, which, frankly, can afford the extra mobo costs, power supplies and electric bills.

Now, of course this extra FSB bandwidth will finally be put to good use with the advent of DDR mobos for the K7, *finally* starting early next month (fingers crossed!); latest news is the 1st. DDR mobos for the P3 will show less improvement because the P3 is stuck at 133 MHz FSB. Fine.

But this isn't why AMD chose the EV6. Indeed, when they made that decision, the DDR standard had either barely-just-been or had-not-yet-been determined by JEDEC. Intel was set to steamroll RDRAM into every PC, and there was little to no indictation that DDR would ever be a volume part in the PC industry. (It'd be used in servers and such.) AMD chose the EV6 because they *had* to, not because they wanted to. It's a great bus when doing what it's designed for--connecting specially made (quite expensive) double-wide SDRAM to Alphas, at FSB freqencies up to 466 MHz. But it offers little to no performance benefit in the here and now for the K7. And as for DDR and high-speed buses, Intel will be releasing their Tualatin revision P3's in Q2 with a 200 MHz FSB, in time for Almador, their (maybe--legal issues with Rambus...) DDR P3 chipset. So yes, the K7 will be first with decent DDR support in the x86 space. The P4's dual-RDRAM chipset and 3.2 GB/s FSB will be faster, though more expensive, as far as memory performance goes, though.

But calling a bus that AMD had exactly zero nada zilch nothing to do with evidence of their design innovation gets an extra HAHAHAHAHAHA from me.

More on AMD's innovative EV6 bus: it's a huge boon for multiprocessing, with the 760MP coming to retail very soon according to Anand.

Unfortunately, according to AMD's Q3 earnings report Investor Conference Call 2 weeks ago (I was listening; somehow I doubt you were...), the 760 MP has been delayed to at least Q1, possibly Q2. They played it off as strategic reasons (business demand down; no major deals with the big 4 server OEMs (Dell, IBM, HP, Compaq) for AMD in the enterprise lines), but considering they only had one 2-way system--behind closed doors and not running anything--at MPF it looks as if their engineering is behind too. On the one hand, too bad, because point-to-point beats shared bus any day. On the other, there's a reason why Intel went with shared bus, and it's not because they'd never heard of PTP. It's, well, easier to implement. When doing the right thing takes over a year longer, it sometimes becomes doing the wrong thing. (Not that I believe that's true here, but it's worth taking into consideration.)

Now, the countryside is littered with Athlon clusters crunching numbers for the scientific community in places where they'd never have considered using a P!!!.

First off, scientific computing is such a niche market as to have absolutely negligable impact on the bottom line of either company. The idea that AMD designed the K7's huge-ass FPU--thus taking up vital die-space--for the lucrative physicist market is laughable. It's an unbalanced design, plain and simple. Second, last time I checked, most scientific computing was being done either on Alphas or on Beowulf's of Celerons. Now, I don't doubt that K7's are moving heavily into the mix; if I was doing scientific computing, I would go with a cluster of Durons in a heartbeat.

But do you really, honestly, think that when AMD decided to go with the 3-wide FPU there were dreams of meteorology and electron potential modeling spinning in their heads? Me either.

And your "analysis" of the supposed advantages of the Coppermine's cache over the Thunderbird's are positively laughable. You see, in the REAL WORLD people don't run benchmarks on their boxes all day. They run apps and processes, usually several at a time. That's why the Athlon's cache is superior--you can keep more in it instead of swapping to system RAM, which is a MUCH BIGGER HIT than having a small amount of cache latency. In the REAL WORLD, the Athlon's cache architecture makes sense, not in your fantasy where we all run CPUmark all day.

Uhhuh. That's why the Katmai P3--with its half-speed 512 Kb L2--was so much faster than the Coppermine? That's why the Athlon "Classic"--with down to 1/3-speed 512Kb L2--is so much better than TBird??

You think it's faster to perform a context switch with a 64-bit bus to L2 than a 256-bit one? Golly, imagine how slow the P4 with its 48 GB/s bus to the 5-cycle latency L2 will be!!

Furthermore, in case you'd forgotten, all these chips operate at over one billion cycles per second. Multitasking occurs at much higher granularity than this, and even if your analysis were right (it's not), the effects of multitasking are invisible to a chip to a second or third order of approximation. The effects of a 7 (or 5!) cycle L2 vs. an 11 cycle one most certainly are not.

And then you go on to say that the K8 will just be a derivative of the K7. You really aren't paying attention at all. The K8 Hammer architecture is completely new, not only extending x86 to true 64-bit while retaining backwards-compatibility with 32-bit and 16-bit code, but adding huge and significant architectural innovations. Go read about it, dimwit, before you guess at what it is. Lots of documentation has been released--even just a quick scan of some Slashdot search results will make you a lot more knowledgeable about it than you are now. Geez...

No, this is false. It is by now quite well known that what will define the Hammer family will be just a simple extension of the x86 ISA to 64 bits--an extension which will have essentially no use for the average PC user, but rather only for those who need 64-bit integer precision (CAD, etc.) or >32-bit memory address space (database, etc.). In addition, "the K8"--that is, the Sledgehammer, aimed at the enterprise market--will feature 2-way CMP and AMD's new Southbridge standard, LDT. Ho-hum. Nice features (LDT has no place in the PC either, though), but nothing extraordinary, especially considering it's not due until early 2002. In addition, there has been mention (Sanders mentioned it in an interview) of another K8 variant called Clawhammer; speculation is that this is a PC version of the K8, although it's not known what, other than x86-64, will differentiate it from, say, Palomino.

If you actually believe the K8 represents an entirely new design, then it may be that your news was correct but just a little (about 12 months) late. The K8 was indeed scheduled to be a ground-up clean-sheet kick-butt design, but was radically scaled back by Sanders less than a year ago. Head Designer and impressive guy Atiq Raza quit around a year ago, following this decision, and the hopes of a truly innovative K8 went with him. Of course, evolution is often better than revolution in the MPU industry--eg. RDRAM. If it can manage to position itself against Itanium, the K8 might look very strong. (Of course, McKinley will be on the way by then, and it's considerably less of a joke.) But claiming that the K8 is a revolutionary new design is plain false.

As for the recent /. articles on the K8, they have all been, IIRC, about the recently released x86-64 simulator to help Linux, etc. port to the new ISA. This has nothing whatsoever to do with the design of the K8 itself--an x86 simulator could help "port" Linux to either a P5 or a P6, although they could not be more different architecturally--just the ISA.

Re: P4 moving into heavy volume in Q3 2001: God, you are a shameless, and dim-witted, Intel apologist, just as I suspected. Q3 2001?

How precisely does this make me an Intel apologist? Is it too early?? It is a known fact that Intel's roadmap moves the P4 solidly into the mainstream category in Q3 '01 with the introduction of the Northwood P4 on a .13 Cu process. Northwood will allow Intel to get good yields at >2.0 GHz, and, more importantly, takes up a much more reasonable die space for mass production. Just as important, its release will coincide with the release of the (hopefully DDR) SDRAM Brookdale chipset, which ought to move the P4 out of the quite-high end where it will be stuck with the dual-RDRAM Tehama chipset. (3rd party DDR chipsets may be out for the P4 before then, but probably not in much volume before Q2 at the earliest.)

Is it too late?? The latest Intel roadmap shows the P4 moving to the upper end of the mainstream category in Q2, but I believe that to be a lie by Intel marketing, eager to cover up the fact that they essentially have no upper-mainstream product from now until Q3 2001, a hole in their product line a mile wide. (Am I still an Intel apologist?) Indeed, this is the reason I just bought AMD stock very recently, and have been encouraging my INTC-owning relatives to sell ever since, well as it turned out, just before the peak late this summer. And yes, like you too I am generally appalled by Intel's heavy-handed anti-consumer tactics--suing VIA and refusing to release a PC133 chipset in a lame attempt to force RDRAM down the industry's throat; paper launching the 1 GHz P3 6 months before even limited volume was available, the 700-850 MHz P3's before it around 3 months early, and the 1.13...oh the 1.13...all in a lame attempt to pretend the P6 could keep up with the Athlon; bribing Michael Dell with special pricing and all several dozen GHz P3s available this summer to spread libelous statements to the media in a lame attempt to disparage AMD's products; spreading IA-64 FUD in a lame though successful attempt to scare designers of competing RISC chips to delay (Compaq, HP) or eliminate (MIPS) their next-gen chips; keeping the Celeron FSB clocked at 66 MHz and "single-processor only" in a lame attempt to...be lame.

Don't worry, I dislike Intel plenty a lot. I cheer for AMD, and make no bones about it.

What bothers me, though, is that, having been on the Athlon bandwagon since summer 1999, when I first read analyses of how the K6's poor scaling was due to architecture not process quality, and how the better balanced K7 had the chance to scale even higher than the P3, I've seen how this position has gone from being contrarian, well-informed and far-sighted to the position of a growing mainstream of ill-informed buzzword-spouting reality-ignoring AMD fanboys. No, not you; the people I'm talking about are much much worse (and hence not nearly as able to fool /. with uninformed arguments). What's even worse, though, is that several influential tech sites employ writers not much more knowledgeable than you, and they spout the same pro-AMD propaganda day after day after day. It's not that I dislike seeing anything pro-AMD or anti-Intel; indeed, exactly the contrary. It's just that I like it to be true.

Plus, AMD's execution with the K7, while quite good, has been well short of the claims that I and many others were making for it over the past year. The benchmarks have been disappointing. There's only so much excitement you can get out of awesome benches in 3DSMax and ViewPerf before you notice those Q3 and Content Creation scores just aren't going to change. (Yes, I know CC is Intel-biased. Whatever.) Thunderbird in particular was a huge disappointment, offering gains on the order of 3-5% over Athlon Classic while the Coppermine P3 beat Katmai by 10-15% (it's that 64-bit vs. 256 bit L2 bus). MP has been MIA for months now. The K7 laptops are late as well; high power-consumption is the price you pay for unneeded FPUs.

Having read the Willamette articles I've now referred you to twice (the DeMone ones on RWT), having seen Paul defend his unorthodox position on the Ace's tech boards for months now, basically skewering even very well-informed arguments on the AMD side, I've gradually become convinced that the "web hardware community" is greatly underestimating the P4's performance. So have many people much more knowledgeable than me--including the formerly (and still, though less so, IMO) AMD-biased Johan.

I usually go around looking to argue with P4-bashers who seem intelligent and well-versed in the technology, because they give the most interesting arguments and are the most willing to learn. Unfortunately, I too often have to correct well-meaning but misleading posters like yourself, who ignorantly pass on the same-old wishful thinking and oversimple analysis as fact.

I like AMD. I really do. I want them to "win", inasmuch as I want anyone to. I really do want them to stay very very competitive, like they are now. (And to make me lots of money!) But I just don't think it's helping them, or helping the truth, to pretend that the K7, a largely derivative design, will be able to keep up with the radically innovative P4 for very long. And I don't think it's furthering the principles of beauty and elegence in design--which is what really interests me in this stuff anyways--to call an insightful and fair (I thought it strongly gave AMD the benefit of the doubt, BTW) analysis of the strengths and weaknesses of the P4 and Mustang designs "an ignorant fluffy rant", or whatever you said.

I won't expect the apology from you, but you have my email address if you should want to send it. Meanwhile, if you're really interested in MPU design, please read Paul's articles at RWT; they're fabulous and take everything to a whole new level. And if they must be anti-Intel, you can't do better than his Merced/Itanium articles, here, here, here and here.

Also you should check out the tech forum at Ace's, and the very AMD-biased but usually literate and often a great site for news and links...JC's. Plus the usual suspects: Tom's, Ars, The Register for juicy-and-occasionally-even-true rumors. You could learn a lot, and trust me, it's fascinating stuff.

Re:You have way too much time on your hands, frien by ToLu+the+Happy+Furby · 2000-10-24 06:02 · Score: 2 · on AMD vs Intel: CPU Design Philosophy

First off, apologies for slipping in ad hominem attacks in my post. However, this was just in response to your similarly inappropriate attacks on Johan and Ace's. The difference, of course, is that my comments were in support of the correct analysis, not disparaging it.

This is not a site on which every little review and rant is meant to be posted. Hannibal's article belonged here because, well, Hannibal is an expert on the technology behind microprocessors.

1) It was Hemos' decision to post this; anyone can submit anything they deem worthwhile.

2) This was neither review or rant, but rather a lengthy and insightful look at some subtle but very important issues that will influence P4 vs. Mustang performance. Just because you've never seen anything on the web supportive of the P4 doesn't make a balanced piece a rant; it just means that you've been reading a lot of ignorant writing.

3) Humorously enough, the "self-promot ing" Hannibal link I offered was exactly "every little review", this time of some gimmicky portable (but monitor-less) PC. I found it entertaining, and was happy to see it on /., but it was the very definition of a fluff piece--like much of /., now that you mention it.

4) Hannibal is NOT an MPU expert. He himself will acknowlege this, and has in his articles (don't have time to find where). Email him yourself and ask him who is more of an expert, himself or Johan De Gelas, and I am relatively certain he'll say Johan. If not, he will readily admit that Johan is at least his equal and that Ace's is a much more technical site than Ars. And he will most certainly admit that Paul DeMone is 10 times the expert he is. Again, I really really like Hannibal's work, it honestly inspired me, and I submit every new Hannibal-on-architecture article to /. But he is just a student, not an expert.

And, just FYI, I have read every single article on microproccessor design that has passed by the /. pages for two years, plus linkage from several other sites, and a few print articles (though I no longer like to touch paper. How primitive...). I could easily look up links, hell, just by using the search features on /. and Anand.

ROFLMAO!

You read the scant handful of poorly chosen architecture articles linked from slashdot and you consider yourself an expert??? HAHAHAHAHAHA. Oh--and sometimes you check your facts with little old 16-year old Anand.

Look dude, it isn't my place to criticize you for not knowing as much about MPU design as I do. It is my place to criticize you for not realizing that there is much more to be known, for not realizing that many people do know more about it than you. I am certainly no expert--I'm just a college student--but it is blindingly clear that I know more about it than you, just as it is clear to most /.ers that they know more about computers than, say, the guy who says he needs to go out and buy more "RAMs" because the new game he just bought says it requires 250MB of free space to install. (Don't worry--you're not that bad, it was just an analogy. ;)

Second it's quite clear that you essentially skipped all the parts of the article you didn't understand and concluded that if you--with your expert education on MPU design from /. and Anandtech--didn't know what was going on, it must be "IYNSHO, fluff masquerading as technical writing". Unfortunately, your opinion, humble or no, does not apply here: it is indeed a fact that this piece contained several new insights, and synthesized information which was not easily available in other forms. This may not meet your standards of being "more than a book report," but it is certainly meets those of technical writing. Obviously Johan could not hope to benchmark the new P4 or Mustang cores, as they are not released yet; still he managed to include some insightful benchmarks which demonstrate the points he was abstractly discussing with ample clarity. (Of course, if you're used to looking at MPUs as mysterious black boxes, then you might wonder what rehashed K6 benchmarks are doing in a Mustang/P4 article.) If you truly believe that this article included "nothing unique", why don't you post just one article detailing the issues I raised in my previous post? Since you've obviously read such an article yourself, MPU expert that you are, it shouldn't be too difficult to dig up a link, even without resorting to "the search features on Anand." (LOL!)

No, Johan didn't take what might be called the "Hannibal route"--i.e. launch into an exploration of the overall design philosophies behind the two cores--because he is writing for a specific audience, a knowledgeable technical audience who can be expected to have read several pieces explaining the important design features of the P4 (not much concrete is known about the Mustang other than that it will be a K7 with tweaked layout to improve critical path and power consumption, and that it may recieve several other enhancements as speculated in the article), specifically those here, here, here, here, and here. Not only have most regular readers of Ace's read all these articles, but they have followed some very interesting debates on them between industry experts on the Ace's tech forum for months now. It might be fair to criticize Johan for submitting an article which clearly assumed such a technical background to /. (although in fairness he includes a link to his earlier, more general P4 article in the very first sentence); of course, it's /. who decides what to post on their own site, not Johan.

Re: preproduction benchmarks, the Tom's piece on the PII and the Firingsquad piece on the K7 were generally the only benchmarks available of the respective chips before their launch. If you followed MPU design news as closely as I do you would know this. There is a thing called an NDA, after all; as these two pieces demonstrate, both Intel and AMD like to make sure that those who choose to break theirs post erroneous information.

You're of course right that a PPro was indeed superior to a PII at a given clock speed; if you look through the article itself instead of just relying on the concluding quotes I posted, you would find benchmarks which clearly understate the known performance of the PII by as much as 30 or 40%, though. There is no doubt Tom's preproduction benchmarks, like Firingsquad's, were horribly off. And as long as you're disputing my "always" contention--I've ponied up the links (and no, it didn't take me very long at all, because I, having followed MPU developments for a couple years, knew for example that it was FS with the bad preproduction K7 benchmarks, and Tom with the PII controversy); why don't you post a single pre-NDA "review" or even just a series of leaked benchmarks on a new x86 core which proved entirely accurate?

Re: definition of a 7th-gen core: You really should pay more attention in class, boy, because I'm schooling you right now. I explained what I meant right after that comment in the original post: The Athlon is a new core designed to scale well to very high clockspeeds. Just like the Willammette. That's why the Willamette performs slower clock-for-clock than a theoreticl P!!! at the same clockspeed

First off, there is no evidence that the P4 has lower IPC than the P3, except for preproduction benchmarks and some ambiguous comments from Intel VPs. If you read my previous post at all, you would realize these would tend to indicate that the P4 actually has higher IPC, not lower. On the other hand, the main evidence that it has higher IPC is that an analysis of all the new, innovative, braniac features of the core strongly indicate that it must.

And second off, you couldn't be more wrong. By calling the K7 a "7th-gen core" you are obviously comparing it to the 6 previous generations of Intel cores. Each of them was able to improve both clock speed on identical process and IPC significantly over the previous generation. The Athlon beats the P6 in clock speed on identical process...but only narrowly: the Athlon sweet spot right now is around 1 GHz on Dresden's .18 Cu process; the P6 around 750 MHz on a .18 Al process. Intel's process is probably slightly better except for the large Cu vs. Al gap, so we can be charitable and say that, on identical processes, the K7 clocks 25% faster in an untweaked core than the P6 does in a much-tweaked core. Indications are that the Mustang/Palomino/Morgan K7 tweak will reach 1.5 GHz on .18 Cu, so perhaps 35-40% better on equivalent processes. As for IPC, the P6 and K7 are essentially equal. Indeed, this is being generous to the K7, as the P6 knocks it all over the place in the fairest cross-platform bench there is, SPEC. Yes, this is because Intel's in-house compiler group is better than AMD's...but the compiler is arguably just as much a part of a core as the silicon itself.

Meanwhile, P4 roadmaps indicate that it will scale 100% better than the P6 on identical processes, and the analysis of Paul DeMone, a far greater MPU expert than you or I could hope to be, is that it will have 15-20% better IPC for integer work, and considerably greater gains for FP. (It's too soon to tell without knowing more about how well compilers will optimize for SSE2.) That would be a 7th-gen core worthy of the leap from 5th to 6th which the P6 provided.

Again, don't get me wrong: the Athlon is clearly clearly superior to the Coppermine P3. But only by about the same degree as the Coppermine P3 was superior to the Katmai P3. That is, *not* by a full "generation"--whatever the hell that is.

Re: important innovations in the K7: It's a huge innovation in the x86 world, something Intel hasn't done since the PPro days. First of all, the EV6 bus is new to x86 and a huge innovation, it is superior in every way to the old Intel GTL+.

BWAA HAA HAA HAA HAA HAA!! Man, I'm rolling on the floor and crying that's so pitiful.

Oh, but I'm being rude. Ahem. Pardon me. You, uh...you do know why it's called the EV6 bus, don't you? ...Even though that name happens to be shared by the current generation of Compaq Alpha MPUs... Or wait; actually the official name of the current Alphas chips is 21264; it's just that they like to code name their core variants things like "EV6" and "EV67" and the upcoming "EV68". Based on what, again? What's that? Based on the code name of the current Alpha platform???

HAA HAA HAA HAA HAA! I asked you to pick one innovative feature of the K7, and you picked the one feature that AMD DIDN'T INVENT!!!!!

Ok, I'm over it now. Phew.

Right. AMD didn't invent the EV6 bus. They didn't help develop it. They in fact had nothing to do with it. They licensed it, wholesale, lock, stock, barrel, from Compaq where it has been in use for quite some time now. On the one hand, it was a good business decision because Intel had just clamped down and decided not to relicense the P6 bus (not really called GTL+ BTW, but don't worry it's a very common mistake) to AMD, and rather than take the time to reinvent the wheel (and thus delay the launch of the K7), AMD decided to go shopping at Compaq. Fine. Smart decision.

Don't give me any of this revisionist history that they did it because it's 200 MHz, though. The K7's extra FSB bandwidth (courtesy of the EV6 bus (and the engineers at Compaq, not AMD)) has up till now been entirely wasted as it is paired with SDR SDRAM (1.6 GB/s FSB, only 1.06 GB/s from DRAM)--generally paired asynchronously with PC133. If it were any help at all, don't you think the Athlon would be winning and not losing in FSB-intensive benchmarks like Q3? Meanwhile, it's a huge waste of pins and power--as well it should be, since it was originally designed for $10,000-50,000 workstations and servers, which, frankly, can afford the extra mobo costs, power supplies and electric bills.

Now, of course this extra FSB bandwidth will finally be put to good use with the advent of DDR mobos for the K7, *finally* starting early next month (fingers crossed!); latest news is the 1st. DDR mobos for the P3 will show less improvement because the P3 is stuck at 133 MHz FSB. Fine.

But this isn't why AMD chose the EV6. Indeed, when they made that decision, the DDR standard had either barely-just-been or had-not-yet-been determined by JEDEC. Intel was set to steamroll RDRAM into every PC, and there was little to no indictation that DDR would ever be a volume part in the PC industry. (It'd be used in servers and such.) AMD chose the EV6 because they *had* to, not because they wanted to. It's a great bus when doing what it's designed for--connecting specially made (quite expensive) double-wide SDRAM to Alphas, at FSB freqencies up to 466 MHz. But it offers little to no performance benefit in the here and now for the K7. And as for DDR and high-speed buses, Intel will be releasing their Tualatin revision P3's in Q2 with a 200 MHz FSB, in time for Almador, their (maybe--legal issues with Rambus...) DDR P3 chipset. So yes, the K7 will be first with decent DDR support in the x86 space. The P4's dual-RDRAM chipset and 3.2 GB/s FSB will be faster, though more expensive, as far as memory performance goes, though.

But calling a bus that AMD had exactly zero nada zilch nothing to do with evidence of their design innovation gets an extra HAHAHAHAHAHA from me.

More on AMD's innovative EV6 bus: it's a huge boon for multiprocessing, with the 760MP coming to retail very soon according to Anand.

Unfortunately, according to AMD's Q3 earnings report Investor Conference Call 2 weeks ago (I was listening; somehow I doubt you were...), the 760 MP has been delayed to at least Q1, possibly Q2. They played it off as strategic reasons (business demand down; no major deals with the big 4 server OEMs (Dell, IBM, HP, Compaq) for AMD in the enterprise lines), but considering they only had one 2-way system--behind closed doors and not running anything--at MPF it looks as if their engineering is behind too. On the one hand, too bad, because point-to-point beats shared bus any day. On the other, there's a reason why Intel went with shared bus, and it's not because they'd never heard of PTP. It's, well, easier to implement. When doing the right thing takes over a year longer, it sometimes becomes doing the wrong thing. (Not that I believe that's true here, but it's worth taking into consideration.)

Now, the countryside is littered with Athlon clusters crunching numbers for the scientific community in places where they'd never have considered using a P!!!.

First off, scientific computing is such a niche market as to have absolutely negligable impact on the bottom line of either company. The idea that AMD designed the K7's huge-ass FPU--thus taking up vital die-space--for the lucrative physicist market is laughable. It's an unbalanced design, plain and simple. Second, last time I checked, most scientific computing was being done either on Alphas or on Beowulf's of Celerons. Now, I don't doubt that K7's are moving heavily into the mix; if I was doing scientific computing, I would go with a cluster of Durons in a heartbeat.

But do you really, honestly, think that when AMD decided to go with the 3-wide FPU there were dreams of meteorology and electron potential modeling spinning in their heads? Me either.

And your "analysis" of the supposed advantages of the Coppermine's cache over the Thunderbird's are positively laughable. You see, in the REAL WORLD people don't run benchmarks on their boxes all day. They run apps and processes, usually several at a time. That's why the Athlon's cache is superior--you can keep more in it instead of swapping to system RAM, which is a MUCH BIGGER HIT than having a small amount of cache latency. In the REAL WORLD, the Athlon's cache architecture makes sense, not in your fantasy where we all run CPUmark all day.

Uhhuh. That's why the Katmai P3--with its half-speed 512 Kb L2--was so much faster than the Coppermine? That's why the Athlon "Classic"--with down to 1/3-speed 512Kb L2--is so much better than TBird??

You think it's faster to perform a context switch with a 64-bit bus to L2 than a 256-bit one? Golly, imagine how slow the P4 with its 48 GB/s bus to the 5-cycle latency L2 will be!!

Furthermore, in case you'd forgotten, all these chips operate at over one billion cycles per second. Multitasking occurs at much higher granularity than this, and even if your analysis were right (it's not), the effects of multitasking are invisible to a chip to a second or third order of approximation. The effects of a 7 (or 5!) cycle L2 vs. an 11 cycle one most certainly are not.

And then you go on to say that the K8 will just be a derivative of the K7. You really aren't paying attention at all. The K8 Hammer architecture is completely new, not only extending x86 to true 64-bit while retaining backwards-compatibility with 32-bit and 16-bit code, but adding huge and significant architectural innovations. Go read about it, dimwit, before you guess at what it is. Lots of documentation has been released--even just a quick scan of some Slashdot search results will make you a lot more knowledgeable about it than you are now. Geez...

No, this is false. It is by now quite well known that what will define the Hammer family will be just a simple extension of the x86 ISA to 64 bits--an extension which will have essentially no use for the average PC user, but rather only for those who need 64-bit integer precision (CAD, etc.) or >32-bit memory address space (database, etc.). In addition, "the K8"--that is, the Sledgehammer, aimed at the enterprise market--will feature 2-way CMP and AMD's new Southbridge standard, LDT. Ho-hum. Nice features (LDT has no place in the PC either, though), but nothing extraordinary, especially considering it's not due until early 2002. In addition, there has been mention (Sanders mentioned it in an interview) of another K8 variant called Clawhammer; speculation is that this is a PC version of the K8, although it's not known what, other than x86-64, will differentiate it from, say, Palomino.

If you actually believe the K8 represents an entirely new design, then it may be that your news was correct but just a little (about 12 months) late. The K8 was indeed scheduled to be a ground-up clean-sheet kick-butt design, but was radically scaled back by Sanders less than a year ago. Head Designer and impressive guy Atiq Raza quit around a year ago, following this decision, and the hopes of a truly innovative K8 went with him. Of course, evolution is often better than revolution in the MPU industry--eg. RDRAM. If it can manage to position itself against Itanium, the K8 might look very strong. (Of course, McKinley will be on the way by then, and it's considerably less of a joke.) But claiming that the K8 is a revolutionary new design is plain false.

As for the recent /. articles on the K8, they have all been, IIRC, about the recently released x86-64 simulator to help Linux, etc. port to the new ISA. This has nothing whatsoever to do with the design of the K8 itself--an x86 simulator could help "port" Linux to either a P5 or a P6, although they could not be more different architecturally--just the ISA.

Re: P4 moving into heavy volume in Q3 2001: God, you are a shameless, and dim-witted, Intel apologist, just as I suspected. Q3 2001?

How precisely does this make me an Intel apologist? Is it too early?? It is a known fact that Intel's roadmap moves the P4 solidly into the mainstream category in Q3 '01 with the introduction of the Northwood P4 on a .13 Cu process. Northwood will allow Intel to get good yields at >2.0 GHz, and, more importantly, takes up a much more reasonable die space for mass production. Just as important, its release will coincide with the release of the (hopefully DDR) SDRAM Brookdale chipset, which ought to move the P4 out of the quite-high end where it will be stuck with the dual-RDRAM Tehama chipset. (3rd party DDR chipsets may be out for the P4 before then, but probably not in much volume before Q2 at the earliest.)

Is it too late?? The latest Intel roadmap shows the P4 moving to the upper end of the mainstream category in Q2, but I believe that to be a lie by Intel marketing, eager to cover up the fact that they essentially have no upper-mainstream product from now until Q3 2001, a hole in their product line a mile wide. (Am I still an Intel apologist?) Indeed, this is the reason I just bought AMD stock very recently, and have been encouraging my INTC-owning relatives to sell ever since, well as it turned out, just before the peak late this summer. And yes, like you too I am generally appalled by Intel's heavy-handed anti-consumer tactics--suing VIA and refusing to release a PC133 chipset in a lame attempt to force RDRAM down the industry's throat; paper launching the 1 GHz P3 6 months before even limited volume was available, the 700-850 MHz P3's before it around 3 months early, and the 1.13...oh the 1.13...all in a lame attempt to pretend the P6 could keep up with the Athlon; bribing Michael Dell with special pricing and all several dozen GHz P3s available this summer to spread libelous statements to the media in a lame attempt to disparage AMD's products; spreading IA-64 FUD in a lame though successful attempt to scare designers of competing RISC chips to delay (Compaq, HP) or eliminate (MIPS) their next-gen chips; keeping the Celeron FSB clocked at 66 MHz and "single-processor only" in a lame attempt to...be lame.

Don't worry, I dislike Intel plenty a lot. I cheer for AMD, and make no bones about it.

What bothers me, though, is that, having been on the Athlon bandwagon since summer 1999, when I first read analyses of how the K6's poor scaling was due to architecture not process quality, and how the better balanced K7 had the chance to scale even higher than the P3, I've seen how this position has gone from being contrarian, well-informed and far-sighted to the position of a growing mainstream of ill-informed buzzword-spouting reality-ignoring AMD fanboys. No, not you; the people I'm talking about are much much worse (and hence not nearly as able to fool /. with uninformed arguments). What's even worse, though, is that several influential tech sites employ writers not much more knowledgeable than you, and they spout the same pro-AMD propaganda day after day after day. It's not that I dislike seeing anything pro-AMD or anti-Intel; indeed, exactly the contrary. It's just that I like it to be true.

Plus, AMD's execution with the K7, while quite good, has been well short of the claims that I and many others were making for it over the past year. The benchmarks have been disappointing. There's only so much excitement you can get out of awesome benches in 3DSMax and ViewPerf before you notice those Q3 and Content Creation scores just aren't going to change. (Yes, I know CC is Intel-biased. Whatever.) Thunderbird in particular was a huge disappointment, offering gains on the order of 3-5% over Athlon Classic while the Coppermine P3 beat Katmai by 10-15% (it's that 64-bit vs. 256 bit L2 bus). MP has been MIA for months now. The K7 laptops are late as well; high power-consumption is the price you pay for unneeded FPUs.

Having read the Willamette articles I've now referred you to twice (the DeMone ones on RWT), having seen Paul defend his unorthodox position on the Ace's tech boards for months now, basically skewering even very well-informed arguments on the AMD side, I've gradually become convinced that the "web hardware community" is greatly underestimating the P4's performance. So have many people much more knowledgeable than me--including the formerly (and still, though less so, IMO) AMD-biased Johan.

I usually go around looking to argue with P4-bashers who seem intelligent and well-versed in the technology, because they give the most interesting arguments and are the most willing to learn. Unfortunately, I too often have to correct well-meaning but misleading posters like yourself, who ignorantly pass on the same-old wishful thinking and oversimple analysis as fact.

I like AMD. I really do. I want them to "win", inasmuch as I want anyone to. I really do want them to stay very very competitive, like they are now. (And to make me lots of money!) But I just don't think it's helping them, or helping the truth, to pretend that the K7, a largely derivative design, will be able to keep up with the radically innovative P4 for very long. And I don't think it's furthering the principles of beauty and elegence in design--which is what really interests me in this stuff anyways--to call an insightful and fair (I thought it strongly gave AMD the benefit of the doubt, BTW) analysis of the strengths and weaknesses of the P4 and Mustang designs "an ignorant fluffy rant", or whatever you said.

I won't expect the apology from you, but you have my email address if you should want to send it. Meanwhile, if you're really interested in MPU design, please read Paul's articles at RWT; they're fabulous and take everything to a whole new level. And if they must be anti-Intel, you can't do better than his Merced/Itanium articles, here, here, here and here.

Also you should check out the tech forum at Ace's, and the very AMD-biased but usually literate and often a great site for news and links...JC's. Plus the usual suspects: Tom's, Ars, The Register for juicy-and-occasionally-even-true rumors. You could learn a lot, and trust me, it's fascinating stuff.

Re:This isn't a discussion about design philosophy by ToLu+the+Happy+Furby · 2000-10-23 13:27 · Score: 3 · on AMD vs Intel: CPU Design Philosophy

Ugh. Ignorant crap getting a +4 insightful. Well, let's get this over with...

Rather, it is a piece of self-promotion by Ace's Hardware, who sent this story in themselves.

Many websites send notices of their original content to each other, especially when they know that it is excellent content, like this article. ArsTechnica sends notices both to Ace's and to /. Here is an example of exactly the same brand of "self-promotion" from Hannibal, and as regarding a (IMO) far less worthy though still interesting article.

The article itself doesn't say anything the knowledgeable don't already know.

This is false. I am a hell of a lot more knowledgeable in matters of MPU architecture than you, and I learned quite a bit. But I suppose you were already an expert on the intricacies of load-store reordering on the P6 vs. the K7, on the precise weaknesses of the K7's branch prediction algorithm (i.e. that it throws an exception and flushes its BTB when presented with more than two branches in a 16-byte aligned code window), on the dependancy scheduling problems of very large instruction reorder buffers and what they imply about the P4's clock-speed ramp. I suppose you'd already seen benchmarks which measured the effects of L2 latency and branch prediction on IPC. (You wouldn't mind posting a link, would you troll?)

In fact, it reads like a high-school report, and not even a very well-written one. E.g., "First we will try to analyze the most important shortcomings, next we will search for possible solutions." Sounds just like the simplistic expositions of a high school term paper.

Way to go, asshole. The author's name is Johan De Gelas. He lives in the Netherlands. ENGLISH IS NOT HIS NATIVE LANGUAGE. I'd like to see you post a single sentence in Danish, much less an incredibly insightful article on competing philosophies in next-generation 1.5 GHz+ MPU design.

Look, I know that there is a lot of mumbo-jumbo laden "technical" architecture discussion going around the web, often quite nonsensical and written by good-old fashioned Americans who just haven't had the benefit of 8th grade grammar (or a solid education in MPU design). The point is, you were horribly wrong to lump this article in with that schlock, and you apparently did so only because it contained terms and explanations which you didn't understand. Furthermore, you made your point, with quite authoritative tone, in a public forum. Of course you have every right to be loud and wrong in /. Indeed, I've been known to be loud and wrong in /. several times before. Still, if you don't know what you're talking about, please please please don't talk.

I repeat: the article is not a technical piece at all. Hannibal at ArsTechnica writes technical pieces about CPU design. This article at Ace's Hardware says nothing insightful.

Completely backwards. Now, let me first say that I not only respect Hannibal tremendously, but that his articles (particularly the excellent RISC vs. CISC in the Post-RISC era) were what inspired me, a bit over a year ago, to begin to learn much more about MPU architecture and design. They are written very vividly, with strong prose and excellent, clear analogies. They do a fabulous job of explaining complicated concepts and new trends in MPU design to a lay reader.

ArsTechnica, like /., is a general-purpose tech site. Ace's Hardware is all about hardware, mainly MPU design and architecture. Indeed, it is perhaps the most respected daily-updated MPU architecture site on the web. Several experts--many very well informed amateurs, many who work in the industry--post in their technical forum. We're talking people like Aaron Spink, MPU designer for Compaq, who works on what is generally acknowledged to be the best MPU design team on the planet (the Alpha). We're also talking people like Paul DeMone, designer for MOSAID, who in his free time writes IMNSHO the best technical series of design articles available for free, including this excellent article which destroyed one of Hannibal's fundamental premises in that Post-RISC article I loved so much. And indeed, Hannibal immediately posted a link to the article and said as much. That's because, as great a service as he provides--and I really, really love Hannibal's articles and they're the first thing I recommend to anyone interested in learning about MPU design--they are *not* technical, they often miss important points which an experienced professional would not (as in this case), and Hannibal is just a student with the benefit of a few architecture classes and a well-worn copy of Hennessy and Patterson.

So by all means, people--if you're reading this and want to learn about the fascinating world of MPU design, start with Hannibal. But just know that his articles, while very good, are *not* technical; when you want technical, a great place to start is Ace's.

Now that we're through with that bit of unpleasantness, let's clean up your misstatements, shall we?

In fact, it misses the point. It dares to call the P4 "innovative" and wonder whether future designs in the x86 world will copy it. Well, of course not! How many times must it be said that the P4 barely keeps up with the Athlon and performs less well than a P!!!? Because, that is a fact. Numerous production samples have leaked, with the test results uniformly and without exception pointing to the fact that even if the platform's performance is improved by release time--which it should, since these are samples not a retail product--it won't outperform a P!!! with equal clockspeed. That's why the P4 is being released at 1.4 and 1.5GHz initially, because if they were released at 1.2GHz they'd be outperformed by the 1GHz P!!! and that wouldn't be good.

Oh really. Just like preproduction benchmarks of the K7 proved it to be "closer to that of a Celeron 366 than any Pentium III." Just like preproduction benchmarks of the PII lead to the following insightful comments from Tom's Hardware (a leader in the "P4 is overhyped, clock-speed isn't everything, blah blah blah" ignorance these days...):

Well, the beef with the Pentium II is that it seems to suffer from BSE (bovine spongiform encephelephy a.k.a. Mad Cow Disease), although I doubt that any British cattle was involved. Although BSE infected products shouldn't be imported, I'm pretty sure we'll also see the Pentium II here in Europe soon after the 3rd of May when it is finally released. However, since I wouldn't eat BSE infected beef, I wouldn't be interested in risking an infection of my computer with this CPU either.

...For former Pentium users there's hardly any attractiveness in the Pentium II either. The Windows 95 performance is hardly any better and in some cases even worse than the cheaper Pentium Pro or Pentium MMX. Windows NT users would be the last ones to be interested in the Pentium II, there is just no reason at all to swap the Pentium Pro for a Pentium II.

Guess what: preproduction benchmarks are always wrong. Again, preproduction benchmarks are always wrong. And in particular, the benchmarks we've seen on those preproduction P4's are--just like the benchmarks included in the articles above (i.e. the K7 scoring only 60% of a clock-normalized PIII on FPUMark; the PII doing worse on 32-bit code than a P5-MMX)--utter nonsense given what we know about the P4's design . Thus the logical conclusion is that, just like the preproduction MPU's "benchmarked" above (and let me remind you that those were at least close enough to final silicon to be clocked at release-ready clock speeds), the P4's we have seen "benchmarked" on the web so far have been sandbagged.

Now, the common reaction to these charges goes something like this: "Sandbagged? Impossible! After all, these P4's are at most one stepping from final silicon, maybe even final silicon! Thus they can't be sandbagged!" Which is utterly false. Obviously the sandbagging isn't done in the chip design--that would be idiotic. Rather, it is done in microcode. Every feature of the chip can be turned on and off, tuned and detuned, in microcode. Thus it is trivial to ship a preproduction MPU off for validation with, for example, part of the L2 cache disabled, or the BTB or instruction reorder buffers set to flush when they don't need to, or the way prediction on the two-cycle L1 cache turned off, or tuned wrong, or with certain x86 instructions mapped to unnecessarily slow circuit paths, or any of dozens and dozens of different things set wrong. Indeed, this is the common state of internal preproduction MPUs, because the only way to test corner cases and pathological cases is by disabling one part of the chip and thus placing unrealistic stress on another. In other words, preproduction chips are sort of like beta software--full of DEBUG code which slows everything down, but isn't worth taking out until you're sure everything works.

"But," you may say, "why would Intel sandbag their preproduction P4's when they know benchmarks will leak out?? Why not build up the hype and all that??" The answer, again, is simple. If you take a look at Intel's history of dealing with prerelease cores, you find that they only hype the projects which are likely to underperform horribly--the i860, the iAPX432, Itanium--and they significantly underplay the ones which are going to kick major booty--eg. the P6 core and now the P4. "But why???" Easy. If Intel has a project which sucks, the best they can hope for is to scare off their potential competitors from the market space until they can get another crack at it. (Remember, there's a 3-or-more year lag-time between the decision to start--or not start--a project and the finished product.) That's exactly what they've done with Itanium, scaring MIPS out of the high-end RISC business, and putting Compaq and HP years behind on their high-end RISC designs, with nothing but a bunch of IA-64 FUD. Meanwhile, if their upcoming core is going to perform incredibly, why waste time hyping and giving your competitors the tip-off?? All that would do is cannibalize the sales of your current MPUs as people wait to get the amazing new chip due out in 6 months. Worse, if Intel hyped the great performance of the upcoming P4, they would need to admit that the average PC user can actually use 1 GHz+ performance...which, of course, would play right into the hands of AMD which is the only player with decent 1GHz+ volume until well into next year. This way, you get to surprise the industry, get great press, and sell off way more of your old, now obsolete chips. Simple, really.

Now, the P4 barely keeps up with the current-generation Athlon Thunderbirds. This is important to note because people always *blamed* AMD for a processor which still, with the advantages of the P!!! SIMD intruction optimizations used in much software, didn't quite keep pace with Intel's offering in the most common benchmarks. Now, the technically knowledgeable know that the Athlon whomps the P!!! in anything that isn't SIMDified, and that its floating point unit is head-and-shoulders above. But people still moaned about the performance gap in certain common SIMDified benchmarks.

Wrong, wrong, wrong. The only cases in which the Athlon clearly bests a Coppermine P3 is in scientific (i.e. double-precision) FPU-heavy simulations, ray tracing, etc. On almost every other benchmark, they are within +/-5% at identical clock speeds, with a few standouts at around +/-8% for each architecture. In particular, 3D games tend to show an affinity for the Coppermine. Blaming this on some "SIMD bogeyman" is ridiculous--every 3D game, and especially a standout game like Quake 3, is optimized for 3DNow just as it is for SSE. Now, you can either deny the facts, or you can try to understand them.

The main culprit, of course, is the difference in L2 latencies. Tbird has a 64-bit bus to L2 at a latency of 11 clock cycles, with 384Kb total cache; Coppermine has a 256-bit bus to L2 at a latency of 7 clock cycles with 256Kb total cache. The Tbird has the bigger cache because the cache design is exclusive; however, it also has much longer latencies for this and other reasons. In the end, there is no comparison as to which is the better design--the Coppermine's cache hierarchy is simply better than the TBird's, no argument about it. And Johan's benchmarks illustrate this rather nicely.

Well, here's what they didn't realize: the Athlon is a truly seventh-generation core--which beat Intel to the punch by, what, almost a year and a half? As such, it has made trade-offs to be able to scale to higher clockspeeds better--one reason why Intel had to recall, and still hasn't re-issued, the 1.13GHz P!!! yet AMD are easily churning out 1.2GHz Athlon Thunderbirds.

"The Athlon is truly a seventh-generation core." What does that mean??? If you think it means the K7 core has one single architectural innovation which does not exist on an MPU available before it, then I challenge you to list it now. (Indeed, I can't think of a single innovation in the K7 which isn't in the P6 core--except for the exclusive cache architecture, which is an overall weakness compared to the Coppermine cache--but there may be some.) If you think it means the K7 is a better core than the P6, well, you're right. The K7 is indeed a better core, in that its pipeline stages are more evenly balanced, and thus it can scale to higher clockspeeds on similar process. On the other hand, the K7 is less well balanced from an execution resources standpoint, including such oafish features as a fully 3-wide FPU (as opposed to the P6's 1.5-wide FPU), which offers at best 40% better performance, but generally no better performance than the P6 on FP intensive apps. Yes, the reason for the discrepancy is partly due to code which is compiled with the P6's execution resources in mind--but of course, that will continue to be most things so long as Intel has the majority of market share (AMD currently sells out all the MPUs it can make and thus has no theoretical way of getting majority market share for at least the next 4 years or so), and most apps are precompiled binary. But it's partly due to the fact that there's just not enough need for 3 full FPUs to justify the die space they take. This is just one example, but the end result is that the K7 is a well-balanced core pipeline-wise which is larger and consumes more power than it can justify based on its ability to get instructions from cache and memory. It is still the fastest thing out there, but it uses brute force to make it there. Time-to-market issues are behind some of these design issues, and some of those will be solved with the upcoming Mustang/Palomino/Morgan core tweak. But that still won't make the K7 anything more than a rebalanced tweaked-out brute-force of a P6. And hey--that ain't bad. But it ain't innovation.

The P4, on the other hand, includes many features never before seen on a commercial MPU. They include: double-pumped ALU, integer decoder and scheduler, and integer retiring (running at up to 4 GHz on a .18 process!!!); trace cache; two-cycle L1 potentially using way-prediction to reach 2.0 GHz on a .18 process; hardware prefetch; and, well, a pipeline deep enough to allow 2.0 GHz on a .18 process. It also includes some impressive resources never before seen on the x86 side of things. They include: 126 op buffer; 3.2 GB/s-4.27 Gb/s FSB; "most accurate branch prediction algorithm ever" (claimed by Intel at MPF a couple weeks ago); 48 GB/s L2->core bandwidth; and SSE2, which will finally let the x86 push double-precision FP code with the big boys, and doesn't resort to a kludgy, die-space-wasting, gas-guzzling halfway-solution like the K7's triple FPU. On the downside there is the branch misprediction penalty of 19 clocks, potentially 27 if the code is not in the trace cache (unlikely). However, even this is mitigated by the fact that while the official branch mispredict penalty of the P6, for example, is a mere 12 clocks IIRC, the actual time to execute new code on a mispredict is more in the neighborhood of 30-50 clocks, because the instructions need to be rescheduled. Meanwhile, the P4 has wider scheduling resources, and thus may not even have a higher branch mispredict penalty in practice at all. It will certainly have many fewer mispredicts, so the overall analysis here is probably a wash.

It is, all-in-all, a very impressive looking chip, more than worthy of the title "seventh generation", whether it turns out to perform well or poorly. However, meaningless sandbagged benchmarks aside, all indications are that it will perform magnificantly. Taken as a whole, the P4 contains not only the sorts of design changes necessary to *double* clock speed on a given process over the P6 (note:WOW), but also *increase* IPC. But we'll see how this beautiful looking design translates to reality when the first actual P4's are released and benchmarked.

Blah blah blah, biased statements towards Ace's.

Ace's is in general a slightly AMD-biased site. "Unfortunately", Johan, Brian, and the rest of the crew there "have to" read the thoughts of actual MPU experts day in and day out in their technical forum, and thus know that the case for the K7--and against the P4--is not what the average hardware site has made it out to be. This is not to take anything away from AMD, which has at the moment by far and away the fastest performing MPUs on the planet, the best binsplits on the planet, and about 1.4x the performance/price of Intel all the way up and down their price lists. However, all appearances are that, once the P4 moves into heavy volume production (note: not until Q3 next year at the earliest, after a process shrink to .13 Cu), Intel will have a very strong and competitive lineup. And that until then, while AMD ought to be the choice of every sane computer buyer around, Intel will have bragging rights for the highest-performing (not just highest-clocking) chip in the x86 space, if not in the world. Furthermore, with the K8 almost certain to be just a derivative of the K7 (probably with 64-bit extensions and 2-way CMP), it looks as if Intel will take back the clock-speed crown and hold it for good. Whether that means it will win the performance crown for good remains to be seen, but I certainly wouldn't discount the P4 core if I were you.

Re:This isn't a discussion about design philosophy by ToLu+the+Happy+Furby · 2000-10-23 13:27 · Score: 3 · on AMD vs Intel: CPU Design Philosophy

Ugh. Ignorant crap getting a +4 insightful. Well, let's get this over with...

Rather, it is a piece of self-promotion by Ace's Hardware, who sent this story in themselves.

Many websites send notices of their original content to each other, especially when they know that it is excellent content, like this article. ArsTechnica sends notices both to Ace's and to /. Here is an example of exactly the same brand of "self-promotion" from Hannibal, and as regarding a (IMO) far less worthy though still interesting article.

The article itself doesn't say anything the knowledgeable don't already know.

This is false. I am a hell of a lot more knowledgeable in matters of MPU architecture than you, and I learned quite a bit. But I suppose you were already an expert on the intricacies of load-store reordering on the P6 vs. the K7, on the precise weaknesses of the K7's branch prediction algorithm (i.e. that it throws an exception and flushes its BTB when presented with more than two branches in a 16-byte aligned code window), on the dependancy scheduling problems of very large instruction reorder buffers and what they imply about the P4's clock-speed ramp. I suppose you'd already seen benchmarks which measured the effects of L2 latency and branch prediction on IPC. (You wouldn't mind posting a link, would you troll?)

In fact, it reads like a high-school report, and not even a very well-written one. E.g., "First we will try to analyze the most important shortcomings, next we will search for possible solutions." Sounds just like the simplistic expositions of a high school term paper.

Way to go, asshole. The author's name is Johan De Gelas. He lives in the Netherlands. ENGLISH IS NOT HIS NATIVE LANGUAGE. I'd like to see you post a single sentence in Danish, much less an incredibly insightful article on competing philosophies in next-generation 1.5 GHz+ MPU design.

Look, I know that there is a lot of mumbo-jumbo laden "technical" architecture discussion going around the web, often quite nonsensical and written by good-old fashioned Americans who just haven't had the benefit of 8th grade grammar (or a solid education in MPU design). The point is, you were horribly wrong to lump this article in with that schlock, and you apparently did so only because it contained terms and explanations which you didn't understand. Furthermore, you made your point, with quite authoritative tone, in a public forum. Of course you have every right to be loud and wrong in /. Indeed, I've been known to be loud and wrong in /. several times before. Still, if you don't know what you're talking about, please please please don't talk.

I repeat: the article is not a technical piece at all. Hannibal at ArsTechnica writes technical pieces about CPU design. This article at Ace's Hardware says nothing insightful.

Completely backwards. Now, let me first say that I not only respect Hannibal tremendously, but that his articles (particularly the excellent RISC vs. CISC in the Post-RISC era) were what inspired me, a bit over a year ago, to begin to learn much more about MPU architecture and design. They are written very vividly, with strong prose and excellent, clear analogies. They do a fabulous job of explaining complicated concepts and new trends in MPU design to a lay reader.

ArsTechnica, like /., is a general-purpose tech site. Ace's Hardware is all about hardware, mainly MPU design and architecture. Indeed, it is perhaps the most respected daily-updated MPU architecture site on the web. Several experts--many very well informed amateurs, many who work in the industry--post in their technical forum. We're talking people like Aaron Spink, MPU designer for Compaq, who works on what is generally acknowledged to be the best MPU design team on the planet (the Alpha). We're also talking people like Paul DeMone, designer for MOSAID, who in his free time writes IMNSHO the best technical series of design articles available for free, including this excellent article which destroyed one of Hannibal's fundamental premises in that Post-RISC article I loved so much. And indeed, Hannibal immediately posted a link to the article and said as much. That's because, as great a service as he provides--and I really, really love Hannibal's articles and they're the first thing I recommend to anyone interested in learning about MPU design--they are *not* technical, they often miss important points which an experienced professional would not (as in this case), and Hannibal is just a student with the benefit of a few architecture classes and a well-worn copy of Hennessy and Patterson.

So by all means, people--if you're reading this and want to learn about the fascinating world of MPU design, start with Hannibal. But just know that his articles, while very good, are *not* technical; when you want technical, a great place to start is Ace's.

Now that we're through with that bit of unpleasantness, let's clean up your misstatements, shall we?

In fact, it misses the point. It dares to call the P4 "innovative" and wonder whether future designs in the x86 world will copy it. Well, of course not! How many times must it be said that the P4 barely keeps up with the Athlon and performs less well than a P!!!? Because, that is a fact. Numerous production samples have leaked, with the test results uniformly and without exception pointing to the fact that even if the platform's performance is improved by release time--which it should, since these are samples not a retail product--it won't outperform a P!!! with equal clockspeed. That's why the P4 is being released at 1.4 and 1.5GHz initially, because if they were released at 1.2GHz they'd be outperformed by the 1GHz P!!! and that wouldn't be good.

Oh really. Just like preproduction benchmarks of the K7 proved it to be "closer to that of a Celeron 366 than any Pentium III." Just like preproduction benchmarks of the PII lead to the following insightful comments from Tom's Hardware (a leader in the "P4 is overhyped, clock-speed isn't everything, blah blah blah" ignorance these days...):

Well, the beef with the Pentium II is that it seems to suffer from BSE (bovine spongiform encephelephy a.k.a. Mad Cow Disease), although I doubt that any British cattle was involved. Although BSE infected products shouldn't be imported, I'm pretty sure we'll also see the Pentium II here in Europe soon after the 3rd of May when it is finally released. However, since I wouldn't eat BSE infected beef, I wouldn't be interested in risking an infection of my computer with this CPU either.

...For former Pentium users there's hardly any attractiveness in the Pentium II either. The Windows 95 performance is hardly any better and in some cases even worse than the cheaper Pentium Pro or Pentium MMX. Windows NT users would be the last ones to be interested in the Pentium II, there is just no reason at all to swap the Pentium Pro for a Pentium II.

Guess what: preproduction benchmarks are always wrong. Again, preproduction benchmarks are always wrong. And in particular, the benchmarks we've seen on those preproduction P4's are--just like the benchmarks included in the articles above (i.e. the K7 scoring only 60% of a clock-normalized PIII on FPUMark; the PII doing worse on 32-bit code than a P5-MMX)--utter nonsense given what we know about the P4's design . Thus the logical conclusion is that, just like the preproduction MPU's "benchmarked" above (and let me remind you that those were at least close enough to final silicon to be clocked at release-ready clock speeds), the P4's we have seen "benchmarked" on the web so far have been sandbagged.

Now, the common reaction to these charges goes something like this: "Sandbagged? Impossible! After all, these P4's are at most one stepping from final silicon, maybe even final silicon! Thus they can't be sandbagged!" Which is utterly false. Obviously the sandbagging isn't done in the chip design--that would be idiotic. Rather, it is done in microcode. Every feature of the chip can be turned on and off, tuned and detuned, in microcode. Thus it is trivial to ship a preproduction MPU off for validation with, for example, part of the L2 cache disabled, or the BTB or instruction reorder buffers set to flush when they don't need to, or the way prediction on the two-cycle L1 cache turned off, or tuned wrong, or with certain x86 instructions mapped to unnecessarily slow circuit paths, or any of dozens and dozens of different things set wrong. Indeed, this is the common state of internal preproduction MPUs, because the only way to test corner cases and pathological cases is by disabling one part of the chip and thus placing unrealistic stress on another. In other words, preproduction chips are sort of like beta software--full of DEBUG code which slows everything down, but isn't worth taking out until you're sure everything works.

"But," you may say, "why would Intel sandbag their preproduction P4's when they know benchmarks will leak out?? Why not build up the hype and all that??" The answer, again, is simple. If you take a look at Intel's history of dealing with prerelease cores, you find that they only hype the projects which are likely to underperform horribly--the i860, the iAPX432, Itanium--and they significantly underplay the ones which are going to kick major booty--eg. the P6 core and now the P4. "But why???" Easy. If Intel has a project which sucks, the best they can hope for is to scare off their potential competitors from the market space until they can get another crack at it. (Remember, there's a 3-or-more year lag-time between the decision to start--or not start--a project and the finished product.) That's exactly what they've done with Itanium, scaring MIPS out of the high-end RISC business, and putting Compaq and HP years behind on their high-end RISC designs, with nothing but a bunch of IA-64 FUD. Meanwhile, if their upcoming core is going to perform incredibly, why waste time hyping and giving your competitors the tip-off?? All that would do is cannibalize the sales of your current MPUs as people wait to get the amazing new chip due out in 6 months. Worse, if Intel hyped the great performance of the upcoming P4, they would need to admit that the average PC user can actually use 1 GHz+ performance...which, of course, would play right into the hands of AMD which is the only player with decent 1GHz+ volume until well into next year. This way, you get to surprise the industry, get great press, and sell off way more of your old, now obsolete chips. Simple, really.

Now, the P4 barely keeps up with the current-generation Athlon Thunderbirds. This is important to note because people always *blamed* AMD for a processor which still, with the advantages of the P!!! SIMD intruction optimizations used in much software, didn't quite keep pace with Intel's offering in the most common benchmarks. Now, the technically knowledgeable know that the Athlon whomps the P!!! in anything that isn't SIMDified, and that its floating point unit is head-and-shoulders above. But people still moaned about the performance gap in certain common SIMDified benchmarks.

Wrong, wrong, wrong. The only cases in which the Athlon clearly bests a Coppermine P3 is in scientific (i.e. double-precision) FPU-heavy simulations, ray tracing, etc. On almost every other benchmark, they are within +/-5% at identical clock speeds, with a few standouts at around +/-8% for each architecture. In particular, 3D games tend to show an affinity for the Coppermine. Blaming this on some "SIMD bogeyman" is ridiculous--every 3D game, and especially a standout game like Quake 3, is optimized for 3DNow just as it is for SSE. Now, you can either deny the facts, or you can try to understand them.

The main culprit, of course, is the difference in L2 latencies. Tbird has a 64-bit bus to L2 at a latency of 11 clock cycles, with 384Kb total cache; Coppermine has a 256-bit bus to L2 at a latency of 7 clock cycles with 256Kb total cache. The Tbird has the bigger cache because the cache design is exclusive; however, it also has much longer latencies for this and other reasons. In the end, there is no comparison as to which is the better design--the Coppermine's cache hierarchy is simply better than the TBird's, no argument about it. And Johan's benchmarks illustrate this rather nicely.

Well, here's what they didn't realize: the Athlon is a truly seventh-generation core--which beat Intel to the punch by, what, almost a year and a half? As such, it has made trade-offs to be able to scale to higher clockspeeds better--one reason why Intel had to recall, and still hasn't re-issued, the 1.13GHz P!!! yet AMD are easily churning out 1.2GHz Athlon Thunderbirds.

"The Athlon is truly a seventh-generation core." What does that mean??? If you think it means the K7 core has one single architectural innovation which does not exist on an MPU available before it, then I challenge you to list it now. (Indeed, I can't think of a single innovation in the K7 which isn't in the P6 core--except for the exclusive cache architecture, which is an overall weakness compared to the Coppermine cache--but there may be some.) If you think it means the K7 is a better core than the P6, well, you're right. The K7 is indeed a better core, in that its pipeline stages are more evenly balanced, and thus it can scale to higher clockspeeds on similar process. On the other hand, the K7 is less well balanced from an execution resources standpoint, including such oafish features as a fully 3-wide FPU (as opposed to the P6's 1.5-wide FPU), which offers at best 40% better performance, but generally no better performance than the P6 on FP intensive apps. Yes, the reason for the discrepancy is partly due to code which is compiled with the P6's execution resources in mind--but of course, that will continue to be most things so long as Intel has the majority of market share (AMD currently sells out all the MPUs it can make and thus has no theoretical way of getting majority market share for at least the next 4 years or so), and most apps are precompiled binary. But it's partly due to the fact that there's just not enough need for 3 full FPUs to justify the die space they take. This is just one example, but the end result is that the K7 is a well-balanced core pipeline-wise which is larger and consumes more power than it can justify based on its ability to get instructions from cache and memory. It is still the fastest thing out there, but it uses brute force to make it there. Time-to-market issues are behind some of these design issues, and some of those will be solved with the upcoming Mustang/Palomino/Morgan core tweak. But that still won't make the K7 anything more than a rebalanced tweaked-out brute-force of a P6. And hey--that ain't bad. But it ain't innovation.

The P4, on the other hand, includes many features never before seen on a commercial MPU. They include: double-pumped ALU, integer decoder and scheduler, and integer retiring (running at up to 4 GHz on a .18 process!!!); trace cache; two-cycle L1 potentially using way-prediction to reach 2.0 GHz on a .18 process; hardware prefetch; and, well, a pipeline deep enough to allow 2.0 GHz on a .18 process. It also includes some impressive resources never before seen on the x86 side of things. They include: 126 op buffer; 3.2 GB/s-4.27 Gb/s FSB; "most accurate branch prediction algorithm ever" (claimed by Intel at MPF a couple weeks ago); 48 GB/s L2->core bandwidth; and SSE2, which will finally let the x86 push double-precision FP code with the big boys, and doesn't resort to a kludgy, die-space-wasting, gas-guzzling halfway-solution like the K7's triple FPU. On the downside there is the branch misprediction penalty of 19 clocks, potentially 27 if the code is not in the trace cache (unlikely). However, even this is mitigated by the fact that while the official branch mispredict penalty of the P6, for example, is a mere 12 clocks IIRC, the actual time to execute new code on a mispredict is more in the neighborhood of 30-50 clocks, because the instructions need to be rescheduled. Meanwhile, the P4 has wider scheduling resources, and thus may not even have a higher branch mispredict penalty in practice at all. It will certainly have many fewer mispredicts, so the overall analysis here is probably a wash.

It is, all-in-all, a very impressive looking chip, more than worthy of the title "seventh generation", whether it turns out to perform well or poorly. However, meaningless sandbagged benchmarks aside, all indications are that it will perform magnificantly. Taken as a whole, the P4 contains not only the sorts of design changes necessary to *double* clock speed on a given process over the P6 (note:WOW), but also *increase* IPC. But we'll see how this beautiful looking design translates to reality when the first actual P4's are released and benchmarked.

Blah blah blah, biased statements towards Ace's.

Ace's is in general a slightly AMD-biased site. "Unfortunately", Johan, Brian, and the rest of the crew there "have to" read the thoughts of actual MPU experts day in and day out in their technical forum, and thus know that the case for the K7--and against the P4--is not what the average hardware site has made it out to be. This is not to take anything away from AMD, which has at the moment by far and away the fastest performing MPUs on the planet, the best binsplits on the planet, and about 1.4x the performance/price of Intel all the way up and down their price lists. However, all appearances are that, once the P4 moves into heavy volume production (note: not until Q3 next year at the earliest, after a process shrink to .13 Cu), Intel will have a very strong and competitive lineup. And that until then, while AMD ought to be the choice of every sane computer buyer around, Intel will have bragging rights for the highest-performing (not just highest-clocking) chip in the x86 space, if not in the world. Furthermore, with the K8 almost certain to be just a derivative of the K7 (probably with 64-bit extensions and 2-way CMP), it looks as if Intel will take back the clock-speed crown and hold it for good. Whether that means it will win the performance crown for good remains to be seen, but I certainly wouldn't discount the P4 core if I were you.

Re:This isn't a discussion about design philosophy by ToLu+the+Happy+Furby · 2000-10-23 13:27 · Score: 3 · on AMD vs Intel: CPU Design Philosophy

Ugh. Ignorant crap getting a +4 insightful. Well, let's get this over with...

Rather, it is a piece of self-promotion by Ace's Hardware, who sent this story in themselves.

Many websites send notices of their original content to each other, especially when they know that it is excellent content, like this article. ArsTechnica sends notices both to Ace's and to /. Here is an example of exactly the same brand of "self-promotion" from Hannibal, and as regarding a (IMO) far less worthy though still interesting article.

The article itself doesn't say anything the knowledgeable don't already know.

This is false. I am a hell of a lot more knowledgeable in matters of MPU architecture than you, and I learned quite a bit. But I suppose you were already an expert on the intricacies of load-store reordering on the P6 vs. the K7, on the precise weaknesses of the K7's branch prediction algorithm (i.e. that it throws an exception and flushes its BTB when presented with more than two branches in a 16-byte aligned code window), on the dependancy scheduling problems of very large instruction reorder buffers and what they imply about the P4's clock-speed ramp. I suppose you'd already seen benchmarks which measured the effects of L2 latency and branch prediction on IPC. (You wouldn't mind posting a link, would you troll?)

In fact, it reads like a high-school report, and not even a very well-written one. E.g., "First we will try to analyze the most important shortcomings, next we will search for possible solutions." Sounds just like the simplistic expositions of a high school term paper.

Way to go, asshole. The author's name is Johan De Gelas. He lives in the Netherlands. ENGLISH IS NOT HIS NATIVE LANGUAGE. I'd like to see you post a single sentence in Danish, much less an incredibly insightful article on competing philosophies in next-generation 1.5 GHz+ MPU design.

Look, I know that there is a lot of mumbo-jumbo laden "technical" architecture discussion going around the web, often quite nonsensical and written by good-old fashioned Americans who just haven't had the benefit of 8th grade grammar (or a solid education in MPU design). The point is, you were horribly wrong to lump this article in with that schlock, and you apparently did so only because it contained terms and explanations which you didn't understand. Furthermore, you made your point, with quite authoritative tone, in a public forum. Of course you have every right to be loud and wrong in /. Indeed, I've been known to be loud and wrong in /. several times before. Still, if you don't know what you're talking about, please please please don't talk.

I repeat: the article is not a technical piece at all. Hannibal at ArsTechnica writes technical pieces about CPU design. This article at Ace's Hardware says nothing insightful.

Completely backwards. Now, let me first say that I not only respect Hannibal tremendously, but that his articles (particularly the excellent RISC vs. CISC in the Post-RISC era) were what inspired me, a bit over a year ago, to begin to learn much more about MPU architecture and design. They are written very vividly, with strong prose and excellent, clear analogies. They do a fabulous job of explaining complicated concepts and new trends in MPU design to a lay reader.

ArsTechnica, like /., is a general-purpose tech site. Ace's Hardware is all about hardware, mainly MPU design and architecture. Indeed, it is perhaps the most respected daily-updated MPU architecture site on the web. Several experts--many very well informed amateurs, many who work in the industry--post in their technical forum. We're talking people like Aaron Spink, MPU designer for Compaq, who works on what is generally acknowledged to be the best MPU design team on the planet (the Alpha). We're also talking people like Paul DeMone, designer for MOSAID, who in his free time writes IMNSHO the best technical series of design articles available for free, including this excellent article which destroyed one of Hannibal's fundamental premises in that Post-RISC article I loved so much. And indeed, Hannibal immediately posted a link to the article and said as much. That's because, as great a service as he provides--and I really, really love Hannibal's articles and they're the first thing I recommend to anyone interested in learning about MPU design--they are *not* technical, they often miss important points which an experienced professional would not (as in this case), and Hannibal is just a student with the benefit of a few architecture classes and a well-worn copy of Hennessy and Patterson.

So by all means, people--if you're reading this and want to learn about the fascinating world of MPU design, start with Hannibal. But just know that his articles, while very good, are *not* technical; when you want technical, a great place to start is Ace's.

Now that we're through with that bit of unpleasantness, let's clean up your misstatements, shall we?

In fact, it misses the point. It dares to call the P4 "innovative" and wonder whether future designs in the x86 world will copy it. Well, of course not! How many times must it be said that the P4 barely keeps up with the Athlon and performs less well than a P!!!? Because, that is a fact. Numerous production samples have leaked, with the test results uniformly and without exception pointing to the fact that even if the platform's performance is improved by release time--which it should, since these are samples not a retail product--it won't outperform a P!!! with equal clockspeed. That's why the P4 is being released at 1.4 and 1.5GHz initially, because if they were released at 1.2GHz they'd be outperformed by the 1GHz P!!! and that wouldn't be good.

Oh really. Just like preproduction benchmarks of the K7 proved it to be "closer to that of a Celeron 366 than any Pentium III." Just like preproduction benchmarks of the PII lead to the following insightful comments from Tom's Hardware (a leader in the "P4 is overhyped, clock-speed isn't everything, blah blah blah" ignorance these days...):

Well, the beef with the Pentium II is that it seems to suffer from BSE (bovine spongiform encephelephy a.k.a. Mad Cow Disease), although I doubt that any British cattle was involved. Although BSE infected products shouldn't be imported, I'm pretty sure we'll also see the Pentium II here in Europe soon after the 3rd of May when it is finally released. However, since I wouldn't eat BSE infected beef, I wouldn't be interested in risking an infection of my computer with this CPU either.

...For former Pentium users there's hardly any attractiveness in the Pentium II either. The Windows 95 performance is hardly any better and in some cases even worse than the cheaper Pentium Pro or Pentium MMX. Windows NT users would be the last ones to be interested in the Pentium II, there is just no reason at all to swap the Pentium Pro for a Pentium II.

Guess what: preproduction benchmarks are always wrong. Again, preproduction benchmarks are always wrong. And in particular, the benchmarks we've seen on those preproduction P4's are--just like the benchmarks included in the articles above (i.e. the K7 scoring only 60% of a clock-normalized PIII on FPUMark; the PII doing worse on 32-bit code than a P5-MMX)--utter nonsense given what we know about the P4's design . Thus the logical conclusion is that, just like the preproduction MPU's "benchmarked" above (and let me remind you that those were at least close enough to final silicon to be clocked at release-ready clock speeds), the P4's we have seen "benchmarked" on the web so far have been sandbagged.

Now, the common reaction to these charges goes something like this: "Sandbagged? Impossible! After all, these P4's are at most one stepping from final silicon, maybe even final silicon! Thus they can't be sandbagged!" Which is utterly false. Obviously the sandbagging isn't done in the chip design--that would be idiotic. Rather, it is done in microcode. Every feature of the chip can be turned on and off, tuned and detuned, in microcode. Thus it is trivial to ship a preproduction MPU off for validation with, for example, part of the L2 cache disabled, or the BTB or instruction reorder buffers set to flush when they don't need to, or the way prediction on the two-cycle L1 cache turned off, or tuned wrong, or with certain x86 instructions mapped to unnecessarily slow circuit paths, or any of dozens and dozens of different things set wrong. Indeed, this is the common state of internal preproduction MPUs, because the only way to test corner cases and pathological cases is by disabling one part of the chip and thus placing unrealistic stress on another. In other words, preproduction chips are sort of like beta software--full of DEBUG code which slows everything down, but isn't worth taking out until you're sure everything works.

"But," you may say, "why would Intel sandbag their preproduction P4's when they know benchmarks will leak out?? Why not build up the hype and all that??" The answer, again, is simple. If you take a look at Intel's history of dealing with prerelease cores, you find that they only hype the projects which are likely to underperform horribly--the i860, the iAPX432, Itanium--and they significantly underplay the ones which are going to kick major booty--eg. the P6 core and now the P4. "But why???" Easy. If Intel has a project which sucks, the best they can hope for is to scare off their potential competitors from the market space until they can get another crack at it. (Remember, there's a 3-or-more year lag-time between the decision to start--or not start--a project and the finished product.) That's exactly what they've done with Itanium, scaring MIPS out of the high-end RISC business, and putting Compaq and HP years behind on their high-end RISC designs, with nothing but a bunch of IA-64 FUD. Meanwhile, if their upcoming core is going to perform incredibly, why waste time hyping and giving your competitors the tip-off?? All that would do is cannibalize the sales of your current MPUs as people wait to get the amazing new chip due out in 6 months. Worse, if Intel hyped the great performance of the upcoming P4, they would need to admit that the average PC user can actually use 1 GHz+ performance...which, of course, would play right into the hands of AMD which is the only player with decent 1GHz+ volume until well into next year. This way, you get to surprise the industry, get great press, and sell off way more of your old, now obsolete chips. Simple, really.

Now, the P4 barely keeps up with the current-generation Athlon Thunderbirds. This is important to note because people always *blamed* AMD for a processor which still, with the advantages of the P!!! SIMD intruction optimizations used in much software, didn't quite keep pace with Intel's offering in the most common benchmarks. Now, the technically knowledgeable know that the Athlon whomps the P!!! in anything that isn't SIMDified, and that its floating point unit is head-and-shoulders above. But people still moaned about the performance gap in certain common SIMDified benchmarks.

Wrong, wrong, wrong. The only cases in which the Athlon clearly bests a Coppermine P3 is in scientific (i.e. double-precision) FPU-heavy simulations, ray tracing, etc. On almost every other benchmark, they are within +/-5% at identical clock speeds, with a few standouts at around +/-8% for each architecture. In particular, 3D games tend to show an affinity for the Coppermine. Blaming this on some "SIMD bogeyman" is ridiculous--every 3D game, and especially a standout game like Quake 3, is optimized for 3DNow just as it is for SSE. Now, you can either deny the facts, or you can try to understand them.

The main culprit, of course, is the difference in L2 latencies. Tbird has a 64-bit bus to L2 at a latency of 11 clock cycles, with 384Kb total cache; Coppermine has a 256-bit bus to L2 at a latency of 7 clock cycles with 256Kb total cache. The Tbird has the bigger cache because the cache design is exclusive; however, it also has much longer latencies for this and other reasons. In the end, there is no comparison as to which is the better design--the Coppermine's cache hierarchy is simply better than the TBird's, no argument about it. And Johan's benchmarks illustrate this rather nicely.

Well, here's what they didn't realize: the Athlon is a truly seventh-generation core--which beat Intel to the punch by, what, almost a year and a half? As such, it has made trade-offs to be able to scale to higher clockspeeds better--one reason why Intel had to recall, and still hasn't re-issued, the 1.13GHz P!!! yet AMD are easily churning out 1.2GHz Athlon Thunderbirds.

"The Athlon is truly a seventh-generation core." What does that mean??? If you think it means the K7 core has one single architectural innovation which does not exist on an MPU available before it, then I challenge you to list it now. (Indeed, I can't think of a single innovation in the K7 which isn't in the P6 core--except for the exclusive cache architecture, which is an overall weakness compared to the Coppermine cache--but there may be some.) If you think it means the K7 is a better core than the P6, well, you're right. The K7 is indeed a better core, in that its pipeline stages are more evenly balanced, and thus it can scale to higher clockspeeds on similar process. On the other hand, the K7 is less well balanced from an execution resources standpoint, including such oafish features as a fully 3-wide FPU (as opposed to the P6's 1.5-wide FPU), which offers at best 40% better performance, but generally no better performance than the P6 on FP intensive apps. Yes, the reason for the discrepancy is partly due to code which is compiled with the P6's execution resources in mind--but of course, that will continue to be most things so long as Intel has the majority of market share (AMD currently sells out all the MPUs it can make and thus has no theoretical way of getting majority market share for at least the next 4 years or so), and most apps are precompiled binary. But it's partly due to the fact that there's just not enough need for 3 full FPUs to justify the die space they take. This is just one example, but the end result is that the K7 is a well-balanced core pipeline-wise which is larger and consumes more power than it can justify based on its ability to get instructions from cache and memory. It is still the fastest thing out there, but it uses brute force to make it there. Time-to-market issues are behind some of these design issues, and some of those will be solved with the upcoming Mustang/Palomino/Morgan core tweak. But that still won't make the K7 anything more than a rebalanced tweaked-out brute-force of a P6. And hey--that ain't bad. But it ain't innovation.

The P4, on the other hand, includes many features never before seen on a commercial MPU. They include: double-pumped ALU, integer decoder and scheduler, and integer retiring (running at up to 4 GHz on a .18 process!!!); trace cache; two-cycle L1 potentially using way-prediction to reach 2.0 GHz on a .18 process; hardware prefetch; and, well, a pipeline deep enough to allow 2.0 GHz on a .18 process. It also includes some impressive resources never before seen on the x86 side of things. They include: 126 op buffer; 3.2 GB/s-4.27 Gb/s FSB; "most accurate branch prediction algorithm ever" (claimed by Intel at MPF a couple weeks ago); 48 GB/s L2->core bandwidth; and SSE2, which will finally let the x86 push double-precision FP code with the big boys, and doesn't resort to a kludgy, die-space-wasting, gas-guzzling halfway-solution like the K7's triple FPU. On the downside there is the branch misprediction penalty of 19 clocks, potentially 27 if the code is not in the trace cache (unlikely). However, even this is mitigated by the fact that while the official branch mispredict penalty of the P6, for example, is a mere 12 clocks IIRC, the actual time to execute new code on a mispredict is more in the neighborhood of 30-50 clocks, because the instructions need to be rescheduled. Meanwhile, the P4 has wider scheduling resources, and thus may not even have a higher branch mispredict penalty in practice at all. It will certainly have many fewer mispredicts, so the overall analysis here is probably a wash.

It is, all-in-all, a very impressive looking chip, more than worthy of the title "seventh generation", whether it turns out to perform well or poorly. However, meaningless sandbagged benchmarks aside, all indications are that it will perform magnificantly. Taken as a whole, the P4 contains not only the sorts of design changes necessary to *double* clock speed on a given process over the P6 (note:WOW), but also *increase* IPC. But we'll see how this beautiful looking design translates to reality when the first actual P4's are released and benchmarked.

Blah blah blah, biased statements towards Ace's.

Ace's is in general a slightly AMD-biased site. "Unfortunately", Johan, Brian, and the rest of the crew there "have to" read the thoughts of actual MPU experts day in and day out in their technical forum, and thus know that the case for the K7--and against the P4--is not what the average hardware site has made it out to be. This is not to take anything away from AMD, which has at the moment by far and away the fastest performing MPUs on the planet, the best binsplits on the planet, and about 1.4x the performance/price of Intel all the way up and down their price lists. However, all appearances are that, once the P4 moves into heavy volume production (note: not until Q3 next year at the earliest, after a process shrink to .13 Cu), Intel will have a very strong and competitive lineup. And that until then, while AMD ought to be the choice of every sane computer buyer around, Intel will have bragging rights for the highest-performing (not just highest-clocking) chip in the x86 space, if not in the world. Furthermore, with the K8 almost certain to be just a derivative of the K7 (probably with 64-bit extensions and 2-way CMP), it looks as if Intel will take back the clock-speed crown and hold it for good. Whether that means it will win the performance crown for good remains to be seen, but I certainly wouldn't discount the P4 core if I were you.

Real sites? by Dr+Caleb · 2000-10-23 00:42 · Score: 2 · on Journalistic Integrity in the Digital Age?

There are lots of sites out there that provide news. The Associated Press has a long rich history of providing "the facts" which they rigorously check. For local news, I check places like Canadian Online Explorer , The National Post or The Globe and Mail. While I admit some of these have some bias, being controlled by large corporations, they still have a long rich tradition. The Globe and Mail for example is over 100 years old.

For tech news, I check BBC Tech News, Ace's Hardware, Tom's Hardware , or ARS Technicia. ZDNet has become way to sensational and biased. And all the crappy banners! More like The National Enquirer of geekdom.

For discussions, I check K5 or Rootprompt. And Slashdot. But it's tough to have a discussion here anymore.

I'm sorry to say, but Slashdot, while I check it regularly, is starting to have too high a signal-to-noise ratio. Not enough "discussion" too much "babooey to natalie portman's beowulf cluster of hot grits and penis bird on toast."

It's safer to stay off the main page if I want some interesting discussion. As well, I don't tolerate mistakes in my profession. No matter what I do, I like it to be as perfect as humanly possible. While I know mistakes happen, there have been far too many here, adding to the signal-to-noise ratio, and reducing my faith in accurate articles.

I get my news elsewhere, but I still come back, hoping the old days will return.

Re:It's all about the portable libraries by Simon+Brooke · 2000-10-19 18:43 · Score: 3 · on Internet C++: Competition For Java And C Sharp?

I think you're underplaying the performance problems of Java. I've been using some XML libraries, and was absolutely shocked by the bad performance. In this application at least, Java is at least 10-100 times slower than C++ code.

Bad Java is slower than good C. Good Java is faster than bad C. In actual comparative benchmarking, Java is faster than C in two of the three tests done. I'm not aware of any recent benchmarking which has come to the opposite conclusion. If your Java is slower than your C, that's your coding, not Java.

The JVM is not a millstone; on the contrary it is extraordinary powerful technology. JITs are better, of course. But static native compilation is not only not necessary, in real testing it confers no benefit.

"If only Apple could be persuaded to use these..." by ToLu+the+Happy+Furby · 2000-10-17 12:49 · Score: 5 · on Is IBM's Power4 A Threat To Alpha, Sparc, IA-64?

Heh! If only Apple would use these, the new iMacs wouldn't exactly be quite able to hit their price points. Paul (the author of the article) and some others were involved in a thread over on the tech forum at Ace's about (amongst other things) the expected cost of one of these puppies.

To quote Paul's response:

Maybe another way of looking at it is perhaps the price of four POWER4 known good die and the ceramic substrate and metal carrier totals $3000 (although I suspect that a tested and 100% functional ceramic substrate itself might approach or exceed $3000 in cost).

The real question is the cost of a fully assembled and tested, 100% functional, POWER4 8-way module? After all what are the chances one of these can be reworked if even just one of the 20,000+ solder ball joints was bad?

So for one of these 8-way on a chip jobs (unsure if they'll be offering 4-way configurations too or if those were just a prototype) it's looking like upwards of $10,000 just for IBM to fab, package, and test the darn things. Add in a system capable of feeding it the tremendous bandwidth it requires to run up to its full potential--8 GB/s to DRAM and a phenomenal 84 (!) GB/s I/O--and...ok, so I know Hemos was just joking when he made that comment about Apple, but you get the idea. These are MPUs you use to fold proteins and run gigantic dynamic-content websites, not surf the web and edit the home video of your kid's elementary school graduation.

On a related note, man these things oughtta show Intel a thing or two about how to marry clever instruction scheduling to brute-force functional units--forget about Itanium; it's gonna take a several-way McKinley system to even take a swing at this these. And it oughtta show Sun a thing or two about the dangers of resting on the laurels of your marketing success when designing new chips. And, as Paul notes in the article, it really oughtta make Alpha engineers worry that for the first time, having the most elegant design may not guarantee the best performance. Compaq has an 8-way SMT Alpha core on the way as well (EV8); too bad the Alpha group's customary position in the world--stepped on and neglected by their corporate masters--means they haven't got the money or manpower to bring it to market until well after POWER4.

"If only Apple could be persuaded to use these..." by ToLu+the+Happy+Furby · 2000-10-17 12:49 · Score: 5 · on Is IBM's Power4 A Threat To Alpha, Sparc, IA-64?

Heh! If only Apple would use these, the new iMacs wouldn't exactly be quite able to hit their price points. Paul (the author of the article) and some others were involved in a thread over on the tech forum at Ace's about (amongst other things) the expected cost of one of these puppies.

To quote Paul's response:

Maybe another way of looking at it is perhaps the price of four POWER4 known good die and the ceramic substrate and metal carrier totals $3000 (although I suspect that a tested and 100% functional ceramic substrate itself might approach or exceed $3000 in cost).

The real question is the cost of a fully assembled and tested, 100% functional, POWER4 8-way module? After all what are the chances one of these can be reworked if even just one of the 20,000+ solder ball joints was bad?

So for one of these 8-way on a chip jobs (unsure if they'll be offering 4-way configurations too or if those were just a prototype) it's looking like upwards of $10,000 just for IBM to fab, package, and test the darn things. Add in a system capable of feeding it the tremendous bandwidth it requires to run up to its full potential--8 GB/s to DRAM and a phenomenal 84 (!) GB/s I/O--and...ok, so I know Hemos was just joking when he made that comment about Apple, but you get the idea. These are MPUs you use to fold proteins and run gigantic dynamic-content websites, not surf the web and edit the home video of your kid's elementary school graduation.

On a related note, man these things oughtta show Intel a thing or two about how to marry clever instruction scheduling to brute-force functional units--forget about Itanium; it's gonna take a several-way McKinley system to even take a swing at this these. And it oughtta show Sun a thing or two about the dangers of resting on the laurels of your marketing success when designing new chips. And, as Paul notes in the article, it really oughtta make Alpha engineers worry that for the first time, having the most elegant design may not guarantee the best performance. Compaq has an 8-way SMT Alpha core on the way as well (EV8); too bad the Alpha group's customary position in the world--stepped on and neglected by their corporate masters--means they haven't got the money or manpower to bring it to market until well after POWER4.

Re:SMP Athlons... by Burning1 · 2000-10-07 20:03 · Score: 3 · on What Happened To SMP For AMD processors?

By the way: AMD has no plans to cripple it's Duron processors.

http://www.aceshardware.com/Spades/read_news.php?p ost_id=15000265 - all Durons will be SMP capable, and, AFAIK, they should take full advantage of the 133MHz / PC266 DDR motherboards.

SMP Athlons... by Burning1 · 2000-10-07 19:47 · Score: 4 · on What Happened To SMP For AMD processors?

You are correct in regards to the processor's support for SMP. The current crop of Athlons (including the Athlon classic) are SMP capable.

In fact, they are SMP limited by the chipsets: If a chipset existed, you could run a box with 1000 Athlon processors - of course, designing such a beast would be impossible...

At any rate, Ace's Hardware has been covering AMD's products fairly diligently. They've posted several articles about the 760MP (The SMP capable Athlon chipset.) One good example is available here: http://www.aceshardware.com/Spades/read_news.php?p ost_id=10000214

From what I understand, the 760MP should be finished between December and January, and on store shelves late Q1 2000.

I wouldnt mind... by Burning1 · 2000-10-01 18:41 · Score: 1 · on Interesting Moderation Proposal

I wouldnt mind a +1, KarmaWhore...

I mean, at least then things will be honest, rather than moderateing people +1, Insightful up for:

"Ace's Hardware has been covering AMD's advances fairly well - they have an SMP Capable chipset (the 760MP) in the works, which should be out by december." ;-)

AMD by Burning1 · 2000-09-29 06:24 · Score: 3 · on Intel Cancels its Timna chip

Ace's Hardware has been covering AMD's advances fairly well - they have an SMP Capable chipset (the 760MP) in the works, which should be out by december.

A little bit of info here: http://www.aceshardware.com/Spades/read_news.php?p ost_id=10000240

As for compaq: well, make what you wish of this: http://www.theregister.co.uk/content/archive/12224 .html

I'd say that AMD is setting it's self up to replace Intel rather quickly. Already, many OEMs are droping their Intel only policy...

AMD by Burning1 · 2000-09-29 06:24 · Score: 3 · on Intel Cancels its Timna chip

Ace's Hardware has been covering AMD's advances fairly well - they have an SMP Capable chipset (the 760MP) in the works, which should be out by december.

A little bit of info here: http://www.aceshardware.com/Spades/read_news.php?p ost_id=10000240

As for compaq: well, make what you wish of this: http://www.theregister.co.uk/content/archive/12224 .html

I'd say that AMD is setting it's self up to replace Intel rather quickly. Already, many OEMs are droping their Intel only policy...

In Depth UltraSparc Info by Scooter[AMMO] · 2000-09-26 23:41 · Score: 3 · on Sun's UltraSPARC III Processor Shipping

Back in February, Ace's Hardware had a really great in-depth article on the UltraSparc series.

It starts by covering the history of the SPARC architecture, and what their naming conventions mean (eg. what is the difference between a US I, a US II, and a US III). It then looks at the design decisions that were made for the US3, which included previous UltraSparc binary compatibility, reducing load latency, pipelining, branch prediction, and scalability. The dicussion of all these topics are rather technical.

The article is long, and the techno-babble may scare off some, but if you have any knowledge of basic CPU operation, particularly of RISC cores, or if you are just curious about some of the quirks related to designing a CPU, you'll eat that article up.

In Depth UltraSparc Info by Scooter[AMMO] · 2000-09-26 23:41 · Score: 3 · on Sun's UltraSPARC III Processor Shipping

Back in February, Ace's Hardware had a really great in-depth article on the UltraSparc series.

It starts by covering the history of the SPARC architecture, and what their naming conventions mean (eg. what is the difference between a US I, a US II, and a US III). It then looks at the design decisions that were made for the US3, which included previous UltraSparc binary compatibility, reducing load latency, pipelining, branch prediction, and scalability. The dicussion of all these topics are rather technical.

The article is long, and the techno-babble may scare off some, but if you have any knowledge of basic CPU operation, particularly of RISC cores, or if you are just curious about some of the quirks related to designing a CPU, you'll eat that article up.

more G450 fun by Anonymous Coward · 2000-09-04 20:12 · Score: 1 · on AMD on Celeron/Matrox Intros the G450

Matrox G450: Budget DualHead Graphics Ace's hardware has also a review and has included DVD benchmarks.

cool Java Projects by Stu+Charlton · 2000-08-20 00:12 · Score: 5 · on A Java-Based Handheld OS

There are tons of cool Java projects, but no, I don't see people dismissing Sun, except a few armchair critics on Javalobby and Slashdot.

Sun has done a lot for the community, and continues to do so. They've made PR mistakes, but so does every company. Instead of mulling over political agendas, let's look at the results:

- JDK 1.3 is a massive improvement in client side performance, since most of Swing was optimized and Hotspot 2.0 client VM came out. This is a tremendous success.

- The J2EE is really catching on and addressing some of the original concerns of doing cross-platform enterprise applications. EJB 2.0 actually does a lot to improve upon the way it integrates objects with databases. The COM bridge will allow us to talk to EJB apps through Visual Basic (which is crucial in the business world). All in all, job well done

- The community has never been bigger or better. JavaOne continues to be the largest technical conference in the world, with over 25,000 attendees -- almost quadruple the Microsoft PDC attendence figures. The community process continues to get new specifications added to it, while Sun focuses on bug fixing and "depth" issues as oppsoed to "breadth" issues. JDK 1.4 will be another "fixes + optimization" release, once again showing that Sun wants this thing to be SOLID.

I really don't give a crap if Java is an ISO or ECMA standard. It took C++ until 1998 (freaking 1998!!) to become an ISO standard. And there STILL are major compiler and library discrepancies 2 years later.

The fact of the matter is that I can write a large Java distributed system much faster than I can write it in C++, and it will be overall less buggy due to a lack of memory leaks, pointer smashes, and better exception handling -- plus the fact that I can buy one of several solid application servers for Java. In C++ I can't do that, I can only really use Windows 2000/COM+ or BEA Tuxedo, or roll my own. If I need access to the VM source code, I have it available to me, though there will rarely be the case where it's the VM at fault, and not my code.

Plus, server side Java code has been shown to be, at times, faster than optimized C code:
Click here for graphs plus source code..(Note to take the benchmark with a huge grain of salt. run it for yourself.. be sure to change his baseline numbers in the source code to match your own baseline.)

Java is a very pleasant language to work in. It's not the best -- I'd prefer Smalltalk -- but given the market climate, and the vast selection of tools and server products, it's quite livable if you love object oriented design.

Re:VLIW = Very Long Instruction Word by ToLu+the+Happy+Furby · 2000-08-11 01:35 · Score: 2 · on AMD Releases X86-64 Architecture Programmers Overview

As a matter of interest, whats your take on Crusoe's VLIW instractions ? To what extent can the problems with VLIW be circumvented by running the compiler at run-time ?

My initial guess would be that you can do it to some extent, provided your compiler is sophisticated enough, and you're prepared to compile several versions of the same source for different input data.

My guess would be similar--that, by looking for dependencies at run-time instead of compile-time, Crusoe has a much better shot at generating fast code and keeping the CPU small, cool, and simple. After all, with their approach, they are successfully able to do what IA-64 can only promise: move all the complexities of instruction scheduling from hardware to software.

Or are they? Because Crusoe (the hardware) is a straight-up in-order VLIW chip, its runtime compiler is actually doing two things: recompiling compiled x86 (or whatever) instructions, and scheduling them. Most of the criticism levied at Crusoe's approach focuses on the first half of this equation, and proceeds along the lines that "JIT is a bad idea, because it's why Java is so slow." As it turns out, I couldn't disagree more. For one thing, much of the reason Java is slower than C/C++ is because it is safer and more OO--it runs its own garbage collector and forces everything to be an object, amongst other things. For another, it's not actually slower! The newest Java JIT's manage to generate faster code than static compilers in many cases--and well they should, because they know more about the machine they are compiling to and the most common critical paths through the software than a static compiler ever could. Indeed, HP is working on a runtime interpreter which will speed up almost any precompiled code.

The reason JIT's can work so well is because they only need to compile the code once, then sit back and profile it, recompiling only when necessary. In other words, they incur a lot of overhead at first, but pretty much stay out of the way afterwards unless they'll really help out.

Now on to the second half of Crusoe's compiler--the scheduler. As I mentioned before, this sounds like a good idea--taking some functionality off of hardware and moving it into software. But when you think about it, you realize there's no such thing as "taking functionality off of hardware and moving it into software"--after all, the "software" still needs to execute on the same hardware!

What you're actually doing, then, is moving a function from having dedicated on-chip hardware performing it to having to be run with normal general-purpose hardware. This still has the very real benefit of making the chip a lot simpler, but now you've added a scheduler that needs to take clock cycles from the code it's trying to schedule.

The big question, then, is how much can the scheduler be like the compiler--that is, just doing its work once and then only stepping in when necessary. If, by moving the scheduler from dedicated on-chip logic to software using general-purpose logic, you make it able to do that much better, then it may be a significant design win. If, however, the scheduler needs to do anywhere near as much work as it would have as dedicated on-chip hardware, you're going to end up losing speed.

Which of these is the case? I have no idea. Obviously, a lot of very smart people (Dave Ditzel, etc.) thought the former. On the other hand, Dave Ditzel is reportedly the one responsible for keeping Sun's chips in-order while the rest of the world moved to out-of-order; a quick comparison between an Alpha 21624 and an UltraSPARC-II (or even the upcoming UltraSPARC-III) shows you who was right on that one. (Hint: not Dave Ditzel.)

What we do know is that Crusoe is a lot slower than Transmeta originally thought a runtime interpreted VLIW processor would be. There have been strong reports that they originally envisioned their processors would be able to beat leading-edge x86 chips handily, and only scaled back to the low-power market once their original benchmarks came back disappointing. Even at the low end of the scale, they're attracting a lot of ridicule amongst chip designers for trying to "reinvent the benchmark" because their chips can't compete. I happen to believe that (work/time)/power is a useful benchmark for the mobile arena; still, there's no denying that Transmeta would rely on traditional benchmarks if they could. Furthermore, it looks as if several chips may end up being able to compete with Crusoe even on (work/time)/power--StrongARM's, the much-maligned Cyrix III, and various other low profile simple RISC chips coming out of the woodwork to compete for the "mobile embedded" market.

So, while I would very much like Transmeta to succeed, so far--just as with IA-64--there's little indication that it's more than a bunch of hype. Perhaps after a disappointing first iteration, VLIW will get its kinks worked out and become the standard general-purpose CPU design philosophy of the next couple decades, just as RISC has been for the last two. (And yes, modern x86 chips are designed according to the RISC philosophy, even though the x86 ISA is CISC.) However, I have to say I doubt it. Looking ahead, all the badassest designs of the future (MAJC, Power4, 21464, SledgeHammer) seem to be moving towards keeping the dynamically scheduling RISC architecture and adapting it for CMP--chip level multiprocessing.

But, as always, only time will tell.

In defense of Rambus by evanbd · 2000-08-10 22:39 · Score: 5 · on Intel To Pull Plug on RAMBUS, Use SDRAM?

Before you moderate this "troll," realize that it is the result of informed research. Mostly from relatively recent articles on Ace's Hardware.

Basically, RAMBUS has the theoretical capability to be significantly faster than SDRAM (not DDR, more later). However, the controllers have problems that prevent this. Basically, RDRAM can keep many pages open and many devices active at a time (more than SDRAM), but the i820 doesn't do this. So the chipset is crippling the RDRAM. Also, as soon as multiple devices are put on the bus, the latencies increase, so if too many chips are present things slow down. This is because of the longer wires needed. at 400MHz (not 800 - its DDR) that really matters. Also, RDRAM has been hindered by low yields and hence higher cost. It is now down to about double PC133 (see pricewatch). Also, the chips are more complex. However, the specs say that a good controller ought to be able to outperform PC133. Not by huge amounts, but by enough to matter. i820 is far from a good controller. Something to think about: the EV7 (maybe EV68, I can't remember) is going to use RDRAM. (also on Ace's hardware). However, it is going to increase performance by using 8 channels in parallel. So until there is a good desktop controller, and RDRAM is similar in price, AND the benchmarks say it's better, I'm using DDR SDRAM. But, the technology isn't inherently bad, just having more than it's share of problems.

---

Re:Well, this is supposedly a new design... by ToLu+the+Happy+Furby · 2000-08-10 14:41 · Score: 2 · on New GHz Competitor In Processor Market Soon

Insightful post. One addition:

considering that the wholesale price of 1 GHz Athlon Thunderbirds was just dropped, and will probably drop again within a month or 2, it'll only be a few months before a 1GHz Athlon can be bought for about $500 or less.

Dunno if you heard, but the price of the 1 GHz TBird is going to drop to $539 on Monday. So rather than a few months until it's about $500, you only need to wait 4 days...

And one nitpick: while the chip we're talking about is supposed to be a redesign, it won't be much of one. Rumor has it the Samuel 2 will add an improved (i.e. fully clocked) FPU and 64 KB of L2 cache. We'll see. While both of these are very badly needed improvements, the fact is that the Samuel 2 will still be a very simple in-order chip trying to compete with the massively superscalar and agressively out-of-order designs of AMD and Intel. As such it just might end up being a great chip for cheap low-power devices, due to its small die size, but it has no chance of offering a viable (pun intended?) alternative for the desktop.

Re:1GHz Samuel won't be that great... by ToLu+the+Happy+Furby · 2000-08-10 13:46 · Score: 2 · on New GHz Competitor In Processor Market Soon

It's disappointing that the Tom's Hardware article doesn't contain an architectural block diagram of the CPU, or much information about how much gets done per clock.

Check out the review at Ace's. It doesn't quite contain a full block diagram, but does a much better job than Tom's at discussing the architecture of the chip.

The summary: whether to try to keep die size/power consumption low or because they didn't have the expertise, Centaur (later bought by Cyrix later bought by VIA, the ones who actually designed this chip) decided to keep it very simple. Thus, the chip is in-order, with a relatively deep pipeline (11 stages), relatively large 128 KB L1 cache (it needs it to try to make up for the fact that as an in-order chip it needs to sit and idle while waiting for any memory accesses), and just 4 execution units--2 SIMD, 1 ALU and 1 FPU.

The good news: it's just 76 mm^2 (on a not-very-good ".18 micron" process), consumes very little power, and runs cool enough to forego a fan. The bad news: no L2 and a half-clocked FPU mean laughable performance as a desktop or even a laptop chip. It might do ok compiling kernels and web browsing, but anything requiring a decent cache or any FPU at all (i.e. playing games, encoding MP3s, anything involving 3D, and even plain old office apps) and you're better off with last year's Celeron or K6-2.

Re:export controls on high end processors by ToLu+the+Happy+Furby · 2000-08-10 13:29 · Score: 2 · on New GHz Competitor In Processor Market Soon

The US government maintains export controls on any processor with a performance of greater than 2 gigaflops, making it slightly difficult for some high tech companies and government agencies in India, China, and Russia to purchase high end Intel chips. By being a Taiwanese company, Via can potentially grab a huge, rapidly growing market from Intel and AMD, which may have legal difficulties selling chips from even their non-US chip fabs to these countries.

Incorrect, and absolutely irrelevent anyways. First off, those export restrictions were lifted a few months ago. And second, current Cyrix III's have their FPU's clocked at only half the chip speed, making the FPU performance of a CIII 550 roughly equivalent to that of a 6-year old Pentium 200, or of a theoretical Athlon 90 or so. Rumor has it the FPU will be running at full clock speed by the time they hit 1000 MHz sometime early next year, but even then you're talking maybe "Athlon 400" performance, possibly worse. And even then, it appears it will only be any good at single-percision FP, not the double-percision which the export controls are concerned with.

In other words, nowhere near worth an export control. And nowhere near worth sticking in anything this side of a web pad, either. Frankly, this chip is just not very good.

Re:Now who's trolling? by tomwa · 2000-08-09 22:52 · Score: 1 · on C# Under The Microscope

there's not a benchmark to be seen

Uh? I assume we have different definitions of 'benchmark'. Running Conways's 'life' (an intensively computational program), a fibonacci calculator and a fast Fourier transform program, with C and Java implementations, and measuring their performance is what i call a benchmark. Could you perhaps enlighten me as to why it isn't?

FWIW, the fibonacci test (not a very real-world test, i admit) shows that the IBM and HotSpot JVMs are both faster than optimised C. I'm a Java zealot, and even i don't believe that!

Re:java by Icculus · 2000-08-09 04:41 · Score: 1 · on C# Under The Microscope

/. article:

C Faces Java In Performance Tests

and outlink to an article @ Ace's hardware.

Re:Just _how_ _old_ is this? (not very) by ashshy · 2000-08-01 18:54 · Score: 1 · on Yet Another K6 Series From AMD

Well... the announcements are many, many moons old, but actual notebooks with the + chips are still fairly new, and not very common.

Being a broke owner of an old K6-2, I am waaayyyy interested in sticking a 550 K6-2+ in my desktop, which both Aces Hardware and Tom's Hardware say is not only possible, but also much better than a regular K6-2 550, or K6-3 400 (the highest clocked, realistically priced, Super7 chips out there).

Problem is, you can't get these chips hardly anywhere without getting a notebook around it... :P
-----
#o#

Re:Assorted rantings. by cheetah · 2000-07-23 02:08 · Score: 3 · on Intel Reacts to AMD

I would just like to point out a few things myself...

Point A) A senior VP at Intel Corporation, Albert Yu, indicated that Willamette would only be 30% faster than the Coppermine. The Willamete will not be that much faster than the Pentium III at the same clock speed. The most important thing to remember about the Willamette is Intel has decided the way to increase performance is to raise the clock speed and sacrifice IPC ( instructions per clock ) efficiency. So you can't compare this transient from the P III to the Willamette to the transitions of the 486 to Pentium. The Pentium had a much better IPC than the 486, the Willamette will most likely have a lower IPC than the P III. But the Willamette is designed to scale to very high clock speeds. You only have to look at the Willamette's pipeline to see this design philosophy. The pipeline has 20 stages; this will make it easy for Intel to raise the clock speed. For more info on how a pipeline length affects CPU performance check out this web page. http://www.aceshardware.c om/Spades/read.php?article_id=50. I would also like to point out that the double clocked ALU is only one stage of the 20 stage pipeline. You do have a good point about the Sse2 instructions, if Intel can get a lot of developers to use them it will make it much harder for AMD to claim they have the fastest FPU. But AMD shouldn't be afraid, they have just rolled out a new architecture that has a lot of room for growth. And I can't see Intel getting that far a head of AMD. I would expect that title of the fastest x86 CPU will change hands many times in the next year, but Intel is not about the totally destroy AMD.

Point B) I am not going to say I disagree with you it's just that I think 3d rendering is not a good example, the average computer user will never do 3d rendering outside of a game. I think that speech recognition is a much better example... something that many people would use if it worked better...

Point C) The Ppro was expensive because they had bad yields. This is the same reason that Rdram is much more expensive today when compared to Sdram.

Re:Where's the DOJ now? by Caktus · 2000-06-25 02:31 · Score: 1 · on Hidden Consequences: Rambus And DDR SDRAM Prices

I think people are putting this out of context.
From Ace's Hardware:

"Finally, I also want to make note of this Register story in which Craig Barrett of Intel responded to the recent licensing deals by reminding us that DRAM makers can produce clean-room versions of SDRAM and DDR SDRAM technologies (much like Compaq with the PC BIOS, and so on)."

Slashdot Mirror

Domain: aceshardware.com

Comments · 338