...But rather encryption to restrict the recipient's ability to access the data after a certain period of time (a week). In truth, it does both very badly.
First it is clear that this cannot be a serious attempt at the "traditional" problem of encryption--for the reason pointed out in many posts (unsecure channel between sender and Yahoo!) as well as a deeper one--this system requires you to give full trust to both Yahoo! and Zixit, as there is no proof whatsoever that they will even bother to encrypt your email when passing it between themselves. (And if you would trust a potentially life-and-death secret to two companies named "Yahoo!" and "Zixit" then you deserve what's coming to you.) Finally, there is a huge problem with verification: the recipient merely needs to "verify" that they actually hold the email address the sender specified. And how, pray tell, do they do that? Likely they instead need only temporary access to that account to recieve a (plaintext??) email giving them a temporary password. Good lord.
Instead it appears to implement an access control restriction--your recipient can only access the email for 7 days before it is gone forever. Of course, this fails for the same reason all access controls fail--the message must finally be displayed in plaintext on an untrusted machine, namely the recipient's. Assuming "Zixit" has implemented some (hackable) fix to the "copy-and-paste attack" (ala the International Lyrics Server), there is still the ever pernicious "screenshot attack". And as always, even if the recipient's machine could somehow be entirely trusted, there is the final undoing of any access control restriction--the digital-to-analog conversion. Just as I can always tape-record the SDMI music coming out of my speakers, and videotape that DVD playing on my TV, this scheme falls rather easily to a pen-and-paper.
Meanwhile, it doesn't even do the trick of "increasing the amount of encrypted emails the FBI has to look through", because all this traffic is presumably just SSL, and there's a whole bunch of that around. Besides, chances are the FBI/CIA/NSA/KGB/alien invaders would rather just install a keyboard sniffer or run a TEMPEST analysis on your computer than have to solve the FACTORIZATION problem or build huge special-purpose number seives and spend several times the lifetime of the universe waiting around to read your email or invent a quantum computer. (Maybe the aliens would rather do the latter.) Or just bring a warrant to Yahoo!/Zixit, who *both* have full plaintext access to your "encrypted" email and will likely be very happy to comply with the FBI. (Or aliens pretending to be the FBI--has no one noticed how unsecure and spoofable search warrants are?)
Um, I think what I'm saying is, this appears pretty lame. The only "useful" thing I can think of that this does is destroy the message if it is not accessed within 7 days. Of course, trusting this means trusting that 1)Zixit actually destroys the message; 2) Yahoo! destroys their copy of it; 3) no one intercepted it when it was passed in plaintext from the sender to Yahoo!; 4) any logs or copies of it as it propogated (in plaintext) across the Internet between the sender and Yahoo! were destroyed; 5) it was actually encrypted between Yahoo! and Zixit...
[If Sony was smary, they would FLOOD the market with cheap PSX2s to get a HUGE installed console base. They then would be able to sell loads of games for years and years]
That was Sony's plan, but IIRC, they switched the chip manufacturing process from a.18 to a.13, and there were some problems during the transition. They didn't want to pass up the Christmas season, so they released it (in limited quantities)
First, that's.25 to.18. Second, they never planned on releasing more than 2.5 million American PS2's by Xmas (now they will be lucky to hit 1 mil., as they are reported to have not only cut the launch numbers from 1 mil. to 500k, but have missed their subsequent shipping targets as well), because they knew they couldn't make more than that.
Third, they are already losing money--quite a lot of money--selling them for $300. (I've seen estimates of a loss of $170/console, but they are probably outdated.) Fourth, it's not like this was a brilliant stroke of marketing genius on Sony's part--all consoles are sold at a loss, always and forever; the money is made back (theoretically and then some) by licensing fees on every game sold.
Fifth and finally, a $99 PS2 would compound Sony's worst fear (and biggest miscalculation IMO)--that is, that everyone would buy a PS2 because it was the cheapest DVD player around, and not to play (read: buy) games for it. Then Sony loses money on the initial sale, and doesn't gain it back on license fees for all the PS2 games that no one is buying because they're all using their PS2s as DVD players. Indeed, it appears as if this is what has happened so far in Japan, where the ratio of PS2 games/PS2 consoles sold is abysmally low. (Of course, the initial round of Japanese PS2 games were themselves abysmal; the US ones, while still no better than the Dreamcast, were much improved.)
Right now there is a small selection of PS2 games, so there is little competition between developers, plus releasing a game early means sales throughout the life of the console, not just a few months. Releasing a game early in a popular console's life is very beneficial to the developer
Actually, this is wrong as well. The PS2 launched in America with a remarkable 27 titles; over 50 are expected by Xmas. (Contrast the other extreme, the N64--which launced in September of 1997 with just 2 or 3 games IIRC and had only like 7 by the end of the year, something awful like that.) The problem, of course, is that those 50 titles are going to be split amongst what now appears to be barely 1 million PS2 owners. Assuming a game/system ratio (for the year) of 3 (this may be generous considering everyone has already shelled out $300 for the console, plus more for a memory card or extra controller required to many of the best games), that's an average of 60,000 sales/game. And that, in the console business, is a collassal failure. Meanwhile, because the PS2 presents programmers with a notoriously steep learning curve, most of the initial games are quite unimpressive, so it's doubtful many will become long-lasting classics or boast the public's perception of the developer.
A lot of developers are going to lose big with their initial PS2 games. While there are other forces at work, the general pace of a system's releases is chosen by the system's manufacturer. Many are predicting a lot of developers very unhappy at a Sony which goaded them into glutting the PS2 game market while being unable to fill their end of the bargain by making PS2 consoles. Many developers are already complaining about the PS2's extremely unorthodox insides and complete lack (at this point) of high-level programming libraries. There are several other options out there... So far it looks like a disaster of a launch for Sony; the question is whether the PS2 has enough power, hype and support to rule the market anyways.
What make Transmeta special is that they have put a dynamic binary translator in a chip and have developed silicon to make it faster.
No, actually you have it backwards. Intel (and later NextGen, AMD and I believe Cyrix) put a dynamic ISA translator *on their chips* starting with the P6--they decode (i.e. translate) x86 instructions into internal "u-op" instructions (AMD calls them "macro-ops", same idea) which are used by the rest of the silicon. (This is necessary because x86 instructions are too heterogenous in length and complexity to work well in a deeply-pipelined out-of-order core.)
What Transmeta did was essentially move this translator *off* the chip, into software. The advantage of this is simpler silicon, and therefore lower power consumption. (Also, all things being equal, higher maximum clock speeds; all things are clearly not equal.) A secondary advantage is that far more resources (16MB IIRC) can be devoted to buffering, tracing, analyzing and optimizing the instructions than on a chip, where the physical chip-size keeps buffers small and optimizations simple. The disadvantage is that all this needs to be run on general-purpose (i.e. slower) silicon--and worse, competes for CPU-time with the very programs it is trying to optimize. (Not to mention takes up 16MB of system resources.)
So far the tradeoff has been (IMO) a big loser except in special circumstances--where you need long battery life, x86-compatibility (otherwise there are faster, smaller, more efficient chips out there, like anything in the ARM family), little weight (otherwise just use a bigger battery), and have efficient enough components for the rest of the system to actually make a difference (this is the gotcha with traditional laptops). Whether this particular set of circumstances will turn out to be a small or huge market niche, it is certainly a small problem space. Of course, much of the blame is due to TM's implementation rather than the (basically sound) idea; apparently their architecture is not up to Intel's standards (their process technology is IBM, so that's not the problem). Of course, mistakes are very common in the first iteration of a wildly new idea--witness Itanium (harnessing VLIW for very different ends--and arguably with less success) for proof of that.
Intel, AMD, and other cpu makers, that being x86 or not, give away the compilers, and see your hardware shine, or help GCC getting good support for your CPU, which we, the normal users can benefit from.
Intel does offer their VTune compilers for sale, as they must in order to legally use them in the SPEC benchmarks where they perform so well. Unfortunately, there are widespread complaints and accusations that they are buggy and tempermental and fail to compile much code that works just fine with gcc, VS, etc. The charge that Intel gets its SPEC scores with compilers which are so optimized that they aren't robust enough for every day use has tarnished Intel's very impressive SPEC scores among some. I haven't ever tried to use VTune so I can't comment as to whether this is FUD or not. It is worth noting that VTune is much much faster than anything else in SPEC, yet rarely used in practice, so there must be something wrong with it.
But Intel does also help other compiler makers incorporate optimizations. I know they specifically work with Cygnus to optimize gcc, and would assume they do the same with MS. AMD also works with compiler makers to get support for 3DNow. (For market reasons--i.e. they will always have smaller market share--AMD designed the Athlon to perform well on P3-optimized code, and thus there is not so much to be gained by including K7 optimizations over and above 3DNow. The P4, on the other hand, is very different from both of them and needs a recompile to perform well, as these numbers demonstrate.)
Don't worry; the SPEC scores have been very poorly reported, while the P4's rather poor performance on non-optimized code has gotten all the press. You are by far not the only one to have missed the SPEC scores and assume that the P4 is a dud. Of course, in some ways this is valid, since SPEC scores are more indicative of the potential of the P4 core than of how well a P4 will perform on today's code. Still, as it turns out, the Alpha scores I was comparing the P4 against in my original post are for chips that won't be released until January; so technically, the P4 has not just the SPECint2000 crown (base and peak) but the SPECfp2000 crown as well (base only)!
Okay, I'm not heavy into hardware, I just wanted to point out the numerous problems with the P4 - it seems that the processor itself is not one of them!
I certainly wouldn't go out and buy a P4 today--a DDR Athlon is a much better deal for today's software. But the SPEC scores show that once we get some P4-optimized software, it's gonna kick butt. So, mediocre as a current product, great as a debut for a new core.
Tolu, you always have something insightful to say about chip design, but you have to repeat yourself fairly often across articles - have you thought about bugging the/. crew for some permanent space to soapbox?
Thanks!
Eh, I do get carried away too easily I suppose. It's always a problem when you feel like you have all this relevant information that many people reading may not know, and you don't know how much of it to repeat. (I generally tend to go for "all of it".) As for submitting an essay to/., I hadn't really thought about it, especially since I feel that there are many people out there who have a whole whole lot more knowledge on MPU design than me. Of course, as I seem to be one of the few who actually tries to enlighten the more software-minded crowd at/., I suppose it might be worth a thought...
The Pentium Pro was the last new core from Intel. And may I remind you - the first issue of the PPro - It beat the Alpha! For like a month, until DEC moved to a new process, Intel beat the king of the RISC chips. So I don't think there is the precedence you seem to see - when Intel brings out a new core, I expect bang - not "it'll scale".
ROFL!
No one noticed it, but you got bang today as well.
The original PPro was, for about a month, and only barely, the fastest chip in the world in SPECint95. The P4/1500 is...the fastest chip in the world in SPECint2000. Its SPECint2k scores are 522/535 base/peak; the fastest previously available processor in the world is an Alpha EV67/833 which scores 511/533. Considering Intel will almost certainly release a P4/1600 before Compaq finally releases a faster clocked Alpha, this gap will almost certainly become even larger for the P4. (And then Alpha will *finally* move from a.25um to a.18um process and kick some butt.)
Even more spectacularly, the PPro shocked the MPU world by being somewhat competitive with the fastest RISC chips in SPECfp95--about 75% the top Alpha scores. Meanwhile, the P4/1500 put up SPECfp2000 scores of...549/558 base/peak, or roughly 90% those of the fastest Alpha.
And yet, just as when the PPro was launched, all we hear about is how the P4 is a failure because it performs poorly on legacy apps. The P6 launched to universal derision from the mainstream computer press because it wasn't any faster than an ordinary P5 at 16-bit apps (yeah, maybe eventually there might be *some* 32-bit apps, but who's going to rewrite their code just to optimize for some new-fangled processor?); it was perhaps the most successful MPU design in history, as predicted by its astonishingly good SPEC95 scores.
The P4 is launching to universal derision from the mainstream computer press because it isn't any faster than an ordinary PIII or Athlon at x87 apps and apps which use instructions which the P4 explicitly deemphasizes in favor of faster replacements (yeah maybe eventually there might be *some* SSE2 and P4 optimized apps, but who's going to recompile or even rewrite some of their code just to optimize for some new-fangled processor?); its SPEC2000 scores are just as astonishing as the PPro's were, if not more so.
Say what you want. My mom's roommate was an election monitor in Chicago in 1960. She showed up, and she was escorted from the room and told that, were she to actually try to monitor anything, she would likely come to grevious physical harm. She went home.
Of *course* there was no "proof". What kind of idiot rigs an election and leaves *proof* lying around?
There were substantial investigations into election irregularities, not just in Illinois but in 11 other states. In Illinois there was a wide-ranging recount in all of the contested precincts--including, one must assume, your mother's roommate's--and they came back only +943 for Nixon, when he was behind by 4500 votes. (Compare, of course, to the current Florida recount, +1457 for Gore from an initial discrepency of 1700 votes.)
Then the Republicans took the case to federal court, where it was thrown out due to lack of evidence. Then they took it to the state electoral board, where it was *also* thrown out due to lack of evidence. Note that no one has ever accused either the state electoral board or the federal courts of falling under the influence of the Cook County democratic machine; indeed, the electoral board was composed of 4 republicans and only 1 democrat.
What sort of lack of evidence?? Well, despite over a month with republican officials crawling the state, demanding recounts, and generally trying their darndest to come up with anything suggesting irregularities, they were not able to present a single affidavit from either an intimidated voter or a cheating election official--like your archetypical "mother's roommate". Indeed, if she had come forward with her story--and believe me, it wouldn't have been difficult as there was national press coverage of the republican team currently scouring the state looking for such stories--then she would have constituted the single piece of solid evidence that we have today that such election irregularities actually existed. The only one.
Sorry, but I don't buy it.
More to the point, that isn't to say that it didn't happen, only to say that the Nixon campaign went as far and farther than the Gore campaign looks likely to do in questioning the results of the election, and they lost. If the Gore campaign manages to draw this election out with challenges and legal action, it may be distasteful and ungracious of them, but it will be no different from what the Nixon campaign did in 1960, popular myth to the contrary.
This is why math professors should stick to math. (Note that I'm a math major, so perhaps I should as well.;)
Not that the longwinded article linked here actually got around to talking about Natapof's "proof" in any detail, but from what I could piece together, all he proved is that under an electoral college system, each voter has a greater chance of deciding the election with his or her vote than under a direct election.
Well no shit, Sherlock. That's why we're sitting here watching a recount come in dozens of votes at a time, arguing about a couple hundred blind old ladies, and fretting about whether more Floridians overseas are serving in the military or dual citizens of Israel. OF COURSE a smaller number of voters has a larger chance of deciding an election under the electoral college system.
In other words, the e.c. is considerably more unstable and capricious than a direct election. There is a much greater chance that the true will of the people will not be reflected in the final result. Why we need a mathematical proof to investigate this is not totally beyond me, because it's an interesting combinatorial result (I'd assume). Why this Natapof guy actually thinks this is a good thing, though, is utterly ridiculous.
His best argument (according to the article) is that we don't complain that the World Series is determined by who wins the most games, not who scores the most runs. Putting aside the fact that the two situations are *not* analogous (for one thing, the fact that there is a different starting pitcher for any given 4-5 games in a row is the most important argument for why we need a best-of-7 Series), the point here is that the World Series is put on for the purposes of *entertainment*, not of deciding who rules the free world. Not that I'm not having a lot of fun with these election results (side note--I helped elect a corpse! Whaddya think of that!), but there's an argument to be made that instability and lack-of-representation in results, while good for sporting events, are actually *bad* for presidential elections.
Furthermore, he shows absolutely no understanding of the greater "rules" of the electoral college "game". For example, the electoral college has, throughout the course of US history, served to prolong and promote slavery and remove incentives for granting female sufferage or encouraging higher voter turnouts. For some excellent explanation why, why don't we read a *relevant* article by someone who's actually qualified to talk about the electoral college, Akhil Reed Amar (Yale professor and one of the foremost academic experts on Constitutional law).
Fascinating article, but the upshot of it is that contrary to what is being stated in your post, there is no proof that voting fraud occured in that election on a significant scale. More importantly, contrary to popular belief and nearly every major newspaper this morning, Nixon did not concede the election out concern for the country's well-being, and neither did the charges go uninvestigated. Indeed, there were major investigations into allegations of voting fraud not just in Chicago but all over the nation, and all of them exonerated Kennedy.
In any case, it's important to remember that, due in no small part to the popular belief that he was robbed in 1960, Nixon got his presidency in 1968. So too did the two candidates in our history who actually were "robbed" by the electoral college (i.e. they won the popular vote but couldn't carry a majority in the e.c.), Andrew Jackson and Benjamin Harrison--both won the presidency 4 years later.
Florida's Electoral could vote for whomever they damn well please.
As I understand, and i'm not an expert, but I heard this on the radio this morning, this can only occur in states that have "faithless electors". I'm not sure if florida is one of them...if it is there is a chance on december 18 that they cast thier vote in favor of who won the popular election.
It's more complicated than that.
Many states have provisions forbidding so-called "faithless electors"--i.e. electors who vote different from the popular vote in the state. However, these provisions are all on the state level, not the federal level, and thus are (arguably) Constitutionally irrelevent to the actual Electoral College vote on Dec. 18.
I believe it is generally accepted that if an elector were to change his or her vote in a state with a provision against faithless electors, then the changed vote would stand but the elector would have some shit to pay with their home state. On the other hand, this would certainly go to the Supreme Court if it occured, and given the current anti-Federalist leanings of conservative judges, that result may be too close to call as well.
The way Rambus handled the situation might not have been the best, but they DO have the patents and they aren't trivial "one-click" ones either.
No, they're exactly like "one-click" patents, except that the way Rambus went about securing and enforcing them is much slimier. Of course, the concepts involved may not seem "trivial" to *you*, but you're not an EE, so--no offense--your opinion isn't what's important here. Indeed, the fact that not only has every other memory company developed interfaces which are "infringing" upon these patents but that Geoff Tate could actually say, "We think it would be difficult, if not impossible, to develop a competing technology to RDRAM and not infringe on our patents," strongly indicates that many of these patents are indeed obvious to someone skilled in the art.
Sort of like how "one-click" might seem like an inane concept to get a patent on to you, because you know that any decent programmer worth a salary could implement it; but to the average person, it involves a moderately impressive knowledge of cookies, databases, etc., and thus isn't trivial at all. Luckily, patent law provides that the former, not the latter standard is necessary. (Unfortunately, the USPTO seems to enforce only the latter anyways.)
As for Rambus' way of handling the situation being "possibly not the best"--no, it was a bit worse than that. What Rambus actually did was take part in JEDEC, the open industry-wide consortium developing the next DRAM standard, and secretly go about filing patents on the very standard being discussed. Either they simply failed to disclose the fact that they had already filed for patents on this technology (and possibly steered discussions into technologies sure to infringe on their patent applications)--which is not only immoral but illegal as well--or they actually went about filing these patents *after* the standards had already been agreed upon, which is hideously immoral and illegal.
The memory manufacturers know this. Companies will never pay royalties unless they absolutely, positively have to.
Not in the memory biz. Indeed, the reason the dramurai have been so reluctant to pay off Rambus is not that they're unused to paying royalties--the DRAM business is rife with royalties and cross-licensed patents; TI already gets patent royalties from every piece of DRAM produced, for example--but rather because they feel that Rambus has gone about getting these patents in a slimy underhanded way.
But this is balanced by the fact that before Samsung, every DRAM manufacturer licensing Rambus' patents was Japanese. Business in Japan is a much smarmier back-room affair than in the US, where giant corporations are used to working out complex patent licensing schemes on trivial patents. This is because Japanese patent law allows patents on ideas, not implementations, and trivial ideas at that--something that the US patent system "doesn't allow", or at least isn't supposed to. The result, of course, is a sluggish deflationary economy ruled over by a handful of giant patent portfolios--er, I mean corporations--in which all economic activity must first be cleared with complex licensing and back-room deals, and in which fighting off yet another extortive patent demand is almost unheard of. An economy which can't grow to save its life, despite interest rates of essentially 0%. The future of the US economy if we don't reverse the recent trend in the USPTO.
Now, the addition of Samsung, a Korean company, to the list of Rambus toadies might seem noteworthy and surprising, except that Samsung is the one dramurai which actually has an important business relationship with Rambus, being the only one currently producing RDRDAM in any quantity (except Toshiba's RDRAM production for PS2). Thus they actually have a reason to fear the loss of their RDRAM license, and thus there are entirely sensible reasons why extortion should work on them, unlike the Japanese who give in to extortion for entirely cultural reasons. Indeed, it's sort of amazing--and a commentary on the validity of Rambus' claims--that Samsung actually held out this long.
When Micron and Infineon license these patents it will mean that they actually have merit (or that they've lost in court). Until then, this is just business as usual, and another sign of how the current patent regime is stifling progress and innovation.
Please tell me this statment isn't serious. I'm so sure that people that order CD's from a club are doing it with the express purpose of ripping off musicians.
That was the point of the statement--few people realize they're ripping off the artist when they buy from a CD club, but everyone has been "educated" into feeling guilty for downloading from Napster. But the reality of things is ironically quite the opposite--Napster costs the musician nothing and ends up selling more of his CDs, whereas CD clubs cost the musician a lot and sell very few of his CDs if any.
One is excellent free publicity, the other terrible publicity that the artist pays a lot for and the labels profit from sleazily. I wasn't accusing people who use CD clubs of being immoral--I used to use them before I knew the business model involved--just uninformed.
Napster isn't really a good example of true p2p filesharing (like gnutella, freenet, etc).
Keep in mind that napster requires a central server to function, and that it is completly controled by a corporation. I don't think this has any real implications for p2p.
Yes, that's true. The server doesn't necessarily have to be controlled by a corporation (duh), but as long as servers cost money most of them likely will. (Don't forget OpenNap, though!)
The same principles hold, though. If it is "illegal" for a corporation to grep a list of hyperlinks for a given string, it is presumably illegal for an individual too. Besides, Gnutella (and presumably Freenet; haven't used it) scales very poorly, and is somewhat unusable for the medium-term. The obvious solution--cache servers--runs into the same legal issues as Napster.
Yes, I realize that there are important legal differences--Napster collects hyperlinks from many people and searches them, as opposed to Gnutella, which only searches your own. (Freenet is like Napster in this regard, though.) And, again, I realize that this does not legally set a precedent.
It's just that this Napster case was probably our best opportunity to have a definitive legal ruling on the technologies involved. Among them: the right to collect user-submitted hyperlinks without checking each for copyright violation; the right to return the results of a grep on those links; the right to exercise fair use in a manner which is not easily distinguishable from infringment to someone assisting in that use (and the right to assist when you don't know if use is fair or infringing). Finally, it might have been nice--say--to have a definitive ruling on whether noncommercial sharing of copyrighted music is fair use (as per the AHRA, hundreds of years of copyright law) or infringing (as per some asinine act snuck through Congress to help prosecute MP3 sharing college students, I forget the name).
Now we'll have to wait, and when these issues come up, it's unlikely that we'll have the full force of a society which almost universally uses and supports Napster (amongst those with access to and knowledge of it)--or the legal expertise of a David Boies--behind us. It's a sad commentary, but if 35 million people were enthusiastic users of DeCSS, that trial probably would have gone differently. If these issues are decided upon in some "fringe" case--like DeCSS--rather than the Napster case, the results are more likely to be similar.
(Also I'm sad at the prospect of losing my Napster, since I will certainly refuse to pay out of principle.)
Correct me if I am wrong, but isn't BMG one of the CD clubs that you purchase a dozen CDs for some price and you don't have any further financial commitments (as compared to some others where you have to purchase a certain number of full-price CDs within a certain time frame)? They might keep it free and load you up on ads or make money selling demographic information about you instead.
BMG is a major record label. They run the BMG CD club, just like Columbia, another major label, runs Columbia House.
Ironically, it is these CD clubs, not Napster, which are stealing from artists. When you get your 12 CDs for 1 penny, those CDs don't just come free from the sky--they come directly out of the artist's pockets.
Yes, that's right. Not only is the artist not making any money (like Napster), they are actually losing it when you order their CD from a CD club. All those CDs are chocked up as "marketing costs", and billed to the artist--along with recording costs, studio time, tour costs, and other promotional costs. (The musician pays every last cent of the cost of recording and selling their album, but the label, not the artist, owns the copyright on their work.) Meanwhile, the label--not the artist, mind you--makes a huge profit by tricking people into paying for all those extra CDs that come along with membership. (For those who don't know, you don't have to pay for anything you don't order; just send it back.)
And finally, unlike Napster, no one who rips off a musician by ordering their CD through a CD club ever goes out and pays for it, because they already have the real thing. A disgusting practice, all in all--one which Napster was helping to end.
Does it bother anyone else that the concept of peer-to-peer file transfer just settled out of court? Yeah, I know, a settlement doesn't set a legally binding precedent. But something tells me we can all thank Napster for selling our rights down the drain.
Of course, it's hard to deny that a settlement probably is in Napster's best interests. Maybe this just means that we can't let corporations fight for our civil rights; they are not citizens, and thus will almost always have little to lose by giving up.
Yes, we're still free to fight on our own, but this is going to take a lot of momentum away. With P2P fragmented amongst a dozen different networks, it's going to be hard to be able to point to something and say, look, if 35 million people engage in a behavior, then by any concept of a social contract based government it cannot possibly be illegal. What are the chances David Boies is going to work pro-bono for Gnutella?
Ugh. Well maybe the rest of the big 5 will be typically shortsided and this will all fall through, and we'll finally get this decaying mess of an anachronistic copyright system hauled in front of the Supreme Court. Or maybe this is better; maybe it's best that it doesn't get that far until the costs of letting media conglomerates rewrite the copyright laws becomes abundantly clear to everyone.
And it doesn't play VCDs, SVCDs and the like, unlike most $170 DVD players nowadays.
But it has better DVD quality than those $170 players; better, even than most $500 players. (This is going on reviews I've read, not my own personal experience.) You may need VCD support and a 5-disc changer, but most people only care about quality and price.
So the only good reason to get a PS2 anytime soon is for the (you guessed it) games.
Well, the people in Japan who have only bought an abysmal average of 1.8 games/ PS2 sold seem to disagree with you. Of course, that number will rise once the games stop sucking so much.
You may find the DC has better graphics for most titles and Dead or Alive 2, which is on both platforms, looks much better on the DC.
It's not as direct a comparison, but Quake 3 on the DC looks better and runs smoother than Unreal Tournament for the PS2. And while this was true for Q3 and UT on the PC, it's fair to say that the DC Q3 comes closer to the original than UT on the PS2 does, an impressive statement considering Q3 only really pulled ahead on high-end PC's. The PS2 is clearly the more powerful machine as a simple sum-of-its-parts, but half the VRAM and much less ease-of-programming are apparently hurting it quite a bit.
Sony has a terrible software ratio(1:1.8) which makes the $188 hit on each PS2 more difficult.And as you know, the $$$ come from software sales.
Why is Sony losing on the PS2? Easy, the PS2 happens to be one of the cheap(est) DVD player on the market.
Sony has apparently forgotten all the lessons they so ably taught Nintendo and Sega with the launch of the PS1. With the PS2, Sony has copied Nintendo's restrictive licensing, the Saturn's difficulty-of-programming, and both Nintendo and Sega's head-in-the-sand arrogance. Hell, they've even copied 3DO and the Phillips CDi with their hairbrained scheme to turn the PS2--which doesn't even ship with a modem--into the mythical set-top-entertainment-center-information-super-on- ramp of Convergence Past, Present and Future. Instead, as you insightfully point out, they may only succeed in losing a hell of a lot of money selling cheap DVD players to people with little intention of playing games. Sony is obviously betting that those people will justify the high purchase price to themselves as cheap-for-a-DVD-player, and then start buying games on the justification of well-I-already-own-the-console. Of course, in order for that to work, the games have to be fun, which they currently aren't. The question is how quickly they'll become so.
This is going to be a damn interesting round of the video-game wars, perhaps the most interesting yet. Objectively, Sega ought to be in a fine position--decent DC sales, great price, PS2 shortages, online gaming outta the box, and a crop of games which matches the PS2 graphically and bests it in gameplay. But they're losing money, and most importantly, they've lost hype. Hardcore gamers love the DC--but they've already got one. For everyone else, the only thing that's gonna get them to buy a DC is that--as you suggest--they go to the store looking for a PS2 and find out there are none. I'm not so sure that this is the sort of thing you want to build a market strategy on.
And then there's the XBox. So far, MS is playing the role of Sony in the last round--listen to developers, make the machine easy to program, snap up as many big-name titles as you can. Of course the big difference is in timing--the PS1 came out second, but only because Sega rushed the Saturn launch, with disasterous results. The XBox is coming late, which is held out by some as the fatal mistake of the N64. But with the lateness should be a corresponding technical superiority, something N64 didn't have. Plus it'll have a ton of top-tier 3rd party games, another fatal weakness of the N64.
It used to be everyone ridiculed the XBox as misguided, bloated, underpowered vaporware. Nowadays the only place you run into those opinions is slashdot, and less and less even here. Time and Newsweek are still sold on the PS2 hype, but developers appear to have moved on, and regularly gush about the XBox. I'm sure we all hope the latter group is more important in the long run.
And then, of course, there's the GameCube. Well, it's nice, and it comes in cute colors. To me it just screams XBox-lite--more powerful and easier than the PS2, but not as powerful or easy as XBox. eDRAM is some pretty hot technology, but still expensive and difficult to fab. Frankly, I don't trust it in the hands of ArtX any more than I would the Bitboys (Oy!). And I just don't think the rest of the system is going to be up to snuff, especially by the time it launches.
What Nintendo has going for it is some hot properties--Mario, Zelda, Metroid, DK, Pokemon. But while some great games have been made out of these, they're in a shrinking niche of the gaming industry, as the power of technology is allowing video games to become much more complex and appeal to an audience far beyond 9-14 year old boys. Meanwhile, MS seems to have miraculously gotten a share of or stolen outright all the great games which were once reasons to look forward to a PS2--Halo, MGS2, Oddworld, Crash, etc. I've heard EA is about to be signed, if they're not already. About all Sony has left is Square, and we'll see for how long.
So if I had to guess, I'd go with the XBox as the victor, the DC as becoming a small but solid success for a Sega desperately in need of that, the NGC as being the same for a Nintendo with rather greater aspirations for it, and the PS2 as garnering significant marketshare but without earning Sony either the profits or the influence it has apparently decided are inevitable.
Despite Microsoft's continued insistence, it's a Windows PC in console packaging and as such is very easy to develop for, since so many people know DirectX.
No, it's a console which runs an embedded-NT kernel and Direct X. It doesn't have a motherboard in the traditional PC sense, but rather a mainboard like a console, with both the CPU and GPU sharing the 6.4 GB/s bus to RAM in a UMA architecture. Yes, the CPU and GPU share the same basic internal core as a PIII and NV20, but they are custom chips, not off-the-shelf. Alright, it's closer to a PC than any other console, but it's not just a Windows PC with different form factor.
And it's not just easier than PS2 for those who know Direct X already: to program the PS2 you need to essentially write balanced assembly code for 2 vector processors all while streaming textures into the tiny 4MB VRAM fast enough to keep up with the action. The libraries that ought to be around to help developers do basic tasks are apparently rather scarce. Meanwhile, Direct X, whatever one may think of it, is certainly a far superior solution, even for someone who has never used it before. That's why in addition to PC developers making the cross-over, there's a whole lot of console-only developers on that list.
When I use Napster, I only share about half my MP3s (still over 2 GB, and generally my more interesting stuff), because if I share them all it takes abominably long to log in.
When I use Gnutella, I often don't share at all, because my CPU utilization goes very high when I do, and then I can't listen to the new MP3s I'm getting without skips. (I assume this is due to my computer needing to check every search string that comes through against my list of shared files.)
Both of these problems are fixable with increased bandwidth and computing power. (Or maybe I just have a buggy version of Gnutella.) I'm very enthusiastic about the possibilities of P2P, and I genuinely try to share as much as possible. While I realize not everyone on Gnutella or Napster is as idealistic, I have a feeling the percentage who are is a good bit higher than the 2% (or whatever) reported. Of course you can't blame CNet for taking the "corporate whore" view of human nature, but in my experience people like to share with each other, and will especially do so whenever it is easy and doesn't have noticable drawbacks.
Fortunately for Intel, they didn't have to take any risks, since every single one of the things you mentioned was done by someone else first. Hell, the Alpha alone did all of them before Intel did. Not one of these technologies were "in it's infancy" when Intel deployed them.
The only risk Intel takes in deploying any of these technologies is the risk that Intel customers won't buy them. That's the risk every company takes when introducing a new model. While yes, it means Intel is taking risks, none of the risks Intel takes actually advance the state of the art.
That's just because he came up with a bad list. Despite the fact that there are very few totally new ideas in the MPU industry (just as there are very few totally new software algorithms), Intel has indeed bet the farm (well, bet the product line) on some very radical design ideas, both in the past and the present.
Some were successful, some crashed and burned. One design that was extraordinarily innovative and successful was the P6 core, introduced in 1995 with the PPro. In it, Intel managed to do "the impossible"--execute variable-length x86 code out-of-order, something that was supposed to be only possible with a fixed-length ISA and was even relatively state-of-the-art there. The way they did this was by essentially "emulating" x86 code by decoding it into internal "RISC-like" ops, which could be run OOO. While I doubt this was an entirely new idea, I'm not aware of any previous implementations of it, much less one as wildly successful as the P6.
One design that was a horrid failure was the iAPX432, an MPU spread out over 3 chips which essentially operated in an object-oriented manner, rather than iteratively like, well, every other chip in history. Perhaps a sign of what was to follow was the fact that the 432's "assembly code" was actually built to closely model ADA, the government's ill-fated OO language. The 432 somehow managed to work, but performed a bit slower than mainstream MPUs from 5 years beforehand. Not too many sold. But there is no doubt that here Intel took a huge risk based on a very interesting idea.
Nowadays Intel is engaged in exactly the same "risky" design behavior in an attempt to further the state of the art. The P4 contains several totally new innovations. Perhaps most prominent is the trace cache, an L1 instruction cache which instead of just dumbly storing instructions, orders them safely and unrolls loops, allowing branch- and dependency-free operation for large swaths of code. In addition, the trace cache stores those internal "RISC-like" ops, not x86 ops like a normal instruction cache; this takes the x86->"RISCop" decoder out of the critical path and should result in higher top-clock-speeds and excellent performance on small looped code which can fit in the L1 trace cache--3D engines, encryption, and FFT (i.e. audio/video encoding/decoding, voice recognition), for example. Trace caches are not a new idea; they've apparently been studied quite a bit in the literature. However, the P4 is the first commercial MPU to include one, and that's a substantial engineering innovation.
Another innovation which is, from what I've heard, actually a totally new idea is the P4's double-pumped ALU and supporting hardware. While the idea of different pieces of hardware running at multiple speeds is of course not new, this is apparently the first time it's been worthwhile to implement it on-die in a commercial MPU. More impressive is the fact that Intel was actually able to get an ALU--one of the most studied logic circuits in history--to run at up to 4.0 GHz in current.18 um process technology. Apparently the way they did this is by implementing a new, lower-latency adding technique. This is the circuit-design equivalent of finding, for example, a faster sorting algorithm; it represents a very impressive achievement. While the double-pumped ALU will likely not have as large an effect on overall P4 performance as the trace cache, it should help out noticeably and it's definitely a radical design.
On the other hand, we have Intel's upcoming IA-64 ISA, an attempt to move the VLIW philosophy from specialized DSP work into general-purpose computing. Again, VLIW is not a new idea, and the idea of a VLIW general-purpose MPU is not either. However, the Itanium is one of the first attempts to actually build one (Transmeta's Crusoe is the other).
Furthermore, it represents quite a risk from a performance standpoint. The basic idea behind VLIW is to in effect take the RISC revolution one step further. While the RISC vs. CISC debate is often treated as a fair fight capable of producing one victor, the reality was quite different. (The following is essentially a synopsis of this excellent article on ArsTechnica.) Instead, each was the best ISA philosophy for the prevailing conditions at the time. CISC was the best design choice for its time--that is, up until the early 80's--and "pure RISC" the best for its time--from the mid 80's until the mid 90's.
The main issues involved the evolution of storage capabilities and compiler technology. First a broad comparison of what CISC and RISC actually mean: CISC refers to a category of ISAs in which a new instruction is concieved of to take care of every possible situation. A (made-up) example of a CISC-like instruction is the following:
CRAZY_OP, mem1, r1, mem2
which does the following: load mem1 from memory, take r1 from a register on the chip, compute (mem1 - r1) / r1^2, and store that in mem2. And there actually were some CISC instructions which were nearly that crazy. The RISC philosophy, on the other hand, would break that one operation down into many--one to load mem1 to a register, one to subract mem1-r1, one to multiply r1*r1, one to divide the two, and one to store the result, for a total of...lessee...5 instructions.
What's the difference? Well, like I said, it came back to storage capabilities and compiler technology. Back in the 70's when CISC was the Right Thing To Do, storage was extremely expensive and thus very scarce. If chips back then had used my RISC design, such an operation would have taken 5 instructions to code; with my CISC design, it takes just one. Yes, the CISC design might need to reserve some extra bits in the opcode field in order to code for so many ridiculous instructions, but overall the compiled RISC code is going to take at least 4 times as much storage space as the CISC code. So even if you didn't expect to run into the above situation very often, it made sense to have an explicit code name for it whenever you did.
As we hit the 80's, these storage issues rapidly eased, to the point where it wasn't such a hardship taking 4 times as much space to say the same thing every once in a while. Meanwhile, back in the CISC way of doing things, you actually needed to find some way to make your chip capable of performing all the goofy instructions that might be asked of it. In essence, it's almost like your assembly code is "compressed" to save storage space, and thus needs to be "decompressed" by the chip. This means complicated chip implementations, each trying to do more in each clock cycle--which means lower top clock speeds. The RISC chip may need more cycles to do perform all 5 instructions, but since it only performs simple instructions, it can have a higher clock speed and thus come out ahead.
But there's a problem with this too: people generally like to program in high-level languages. RISC is a low-level ISA philosophy. Thus you need to have good compilers, to be able to analyze high-level instructions and decompose them into all their composite parts for encoding in a RISC assembly language--often a more difficult process than in my example. Again, the compilers of the 70's weren't up to the task; only in the 80's did good enough compilers come along to enable this. In essence, we moved the "decompression" of a high-level instruction to its low-level constituant operations from inside the chip (CISC) to in the compiler (RISC).
Thus, we went from CISC being a Good Thing to RISC being a Good Thing. The main issues were 1) code bloat not such a big deal and 2) move more instruction scheduling duties to the compiler.
Since that time, we've moved from what I called "pure RISC" to what Hannibal in the article I'm summarizing calls "post-RISC". That is, people started realizing that with RISC operations being more-or-less uniform, a good way to make things to faster was to do more than one thing at a time, and that instead of sitting and waiting on a long memory access, etc., you could switch and do other stuff at the time. Thus we got superscalar and out-of-order execution, respectively.
Moreover, we got deeper and deeper pipelines--sort of like assembly lines, in which each instruction goes through several stages, each 1 clock long, in its execution. This means we can clock the chip faster (less to do on each clock cycle), and get overall faster performance (think a fire brigade of 10 people each passing buckets a short distance, vs. one person running 10 times as far between buckets delivered). The problem is that, unlike buckets or trucks, code has dependencies; instruction 2 might take as its input the result of instruction 1, which is still in the pipeline--only halfway down the assembly line, as it were. Thus we need rescheduling logic to keep our pipeline stuffed--our assembly line filled--with instructions which don't depend on each other. Or, instruction 1 might be a branch instruction, which goes one way or another based on its result, so that we don't know "what comes next" until it is completely finished. Thus we use branch prediction, which uses some statistical methods to guess what comes next, and execute it accordingly, while aware that if when we get to the end of instruction 1 it turns out something else came next, we need to go over and do that instead.
The result of all this out-of-order superscalar pipelined "post-RISC" stuff was much higher IPC (Instructions executed Per Clock), but also lots of complicated logic on MPUs to handle all the scheduling and dependency checking and prediction. Theoretically, just as all the complicated logic made CISC chips complicated and slow, all this complicated logic makes today's post-RISC too complicated, too large, too hot, and slower than they might otherwise be. [end summary]
Thus, the basic idea behind VLIW is an extension of the idea behind the CISC->RISC transition. To wit: why not take all this complexity out of the MPU and put it back into the compiler? That way, we can get rid of all the unpleasantness once, at compile time--on the developer's time, not the user's. The way it does this is by trying to find all the parallelism, work out all the dependencies, and predict all the branches at compile time--in other words, to do all the scheduling at compile time. The way it communicates this to the chip, then, is to compile not to individual instructions for the chip to schedule, but rather into prescheduled "bundles"--or "Very Long Instruction Words"--which are supposed to be guaranteed to work well when run together in parallel.
Or rather, this is how VLIW works where it is normally used--in DSP type processors, running programs for which it is very easy to extract this sort of data at compile-time. Problem is, it is much more difficult to do with general-purpose programs, which is why it hasn't been done before. As you might guess, there's just too much you don't know at compile-time for you to get unambiguous scheduling information. Transmeta solves this problem by compiling at run-time, using their code-morphing software, essentially a JIT compiler. The problems with this are obvious and well known: namely, that the JIT compiler uses resources which would otherwise go to running the program, and that you don't get the VLIW benefit of doing all the optimization once and forgetting about it. (The code-morphing software caches, profiles, and further optimizes the code its already run, but it still always running, and doesn't save this information from session to session.) Indeed, you're essentially moving the scheduling problem from one which is done by specialized on-chip logic in different pipeline stages than the execution logic--and thus not competing for execution resources--to one which is run by the general-purpose execution logic; a shaky trade-off at best. On the other hand, by working in software you theoretically get more flexibility to schedule instructions than when doing the scheduling with a chip's fixed logic.
The way IA-64 handles this problem is to have the compiler insert "hints" about which instructions look like they *might* be able to run in parallel, without dependencies; which way a branch is *likely* to go; which scheduling is *likely* to make good use of the chip's execution resources. The problem with this is, as the hints are inevitably going to be wrong, the chip needs its own analogues of much of the scheduling hardware it was trying to get rid of in the first place. In some ways, it's little more than a change in terminology: with OOE designs you have a smallish general register set with a large set of "rename registers", so that each instruction running in parallel essentially thinks it has a full copy of the general register set all to itself; with IA-64, you just have a huge general register set so that each parallel instruction has enough registers to work with.
The problem is, of course, that you haven't done what you set out to do--eliminate complex scheduling logic from the processor. Instead, you've just replaced it with similar but less-well understood versions of the same stuff. The end result is the that Itanium core, far from being small, simple and clocking fast, is huge, complex, unbalanced, and therefore capable of pitiful clock speeds. The die is ~300mm^2--roughly 3 times the size of a P3--yet only has room for a total of 16kb L1 and 96kb L2 cache, less than even a lowly Celeron. (Server level chips like Itanium generally need much *larger* caches than PC chips; Itanium is supplemented with a large off-chip L3 cache, but it is too high-latency to be much use.) Itanium was supposed to launch in early 1998 at 800MHz; it is only now yielding above 733MHz--again, Celeron territory.
Furthermore, we run into trouble from an unexpected place--code bloat. Of course, it's not the same problem as in the 70's, when we used CISC ISA's to keep code small so that they could be stored at all; today's 100GB HD's testify to that. Rather the problem is that *bandwidth* to storage is very often the limiting factor with today's technologies, and that high-bandwidth storage--i.e. on-chip cache--is just as scarce as overall storage was in the 70's. With all its hints and bundling and exception codes to execute if the hints turn out wrong, IA-64 is much more bloated than x86 or RISC code, and thus those not-even-Celeron sized on-die caches are effectively even smaller.
Of course, Itanium has more functional units than the P6 core, and if all these compiling tricks actually keep them full of instructions, it will perform much better per clock. Unfortunately, all indications are that even with the relaxed "hints instead of guarantees" rule, it's still just too difficult for today's compiler technology to keep this monster even remotely well-fed. Intel even had the gall to claim at their recent Intel Developers' Forum that the SPEC CPU benchmarks were "irrelevent" for Itanium's target market, offering instead a (hand-written in assembly) RSA encryption benchmark in which Itanium demolished a Sun USIII. Well, that's fine, except that a very cheap dedicated encryption chip can beat the Itanium at this game several times over for 1% the cost and power requirements. Of course, the SPEC benchmarks run exactly the sorts of programs used in Itanium's target market, and are the most relevent measure possible. And not coincidentally, they are extremely sensitive to compiler quality.
So...to get back to our original topic, IA-64 is another huge risk--Intel has repeatedly called it a "bet-the-company thing"--which incorporates some very interesting, non-mainstream ideas in an attempt to radically advance the state of the art. And so far, it appears not to be working.
Don't worry too much about Intel, though; from all indications, McKinley, the 2nd-generation IA-64 core, should perform just fine thank you. Interestingly enough, it was designed almost entirely by HP engineers. But it also must be emphasized that they have clearly learned from Intel's myriad mistakes with Itanium. (Everything about Itanium, from the pitiful tacked-on caches to the rather unnatural pipeline design--apparently an extra stage needed to be added late in the design process--indicates that this design was a "learner".) Plus, Itanium has been delayed so long that the almost-on-schedule McKinley is due out relatively soon--roadmaps have it as soon as Q4 2001 (dubious), and Q1 2002 might actually be reasonable. McKinley should clock just fine (although not as high as the CISC front-end P4), and has plenty of on-die cache. And in a year and a bit, the compilers might finally be ready too.
So, Intel might just turn this risky strategy into gold. Maybe the "post-RISC" paradigm *will* run out of gas soon, and VLIW will speed past. The point is, for better or worse, Intel's MPU designers are not conservative in the least.
AMD, on the other hand, has never introduced any significant new MPU design techniques that I can think of; instead they concentrate on implementing Intel's designs better than Intel. Indeed, their first PC MPUs had the same names as Intel's--AMD made a "386" and a "486", and possibly a "286" too, I don't remember. The much-vaunted K7 is really quite similar to the P6 core, just with more functional units, larger buffers, more decoders--more more. It's a better version of the P6 (though less power efficient), but it's not terribly innovative. Of course, AMD was in a precarious enough position market-wise that they didn't have the luxury of taking engineering risks. Intel being relatively secure (and percieving themselves, Andy Grove's catchy business-trade bestsellers notwithstanding--as even more so), they can and do experiment with some wacky stuff. Some of it works, some of it doesn't, and for some of it they take their massive market power and force it to work.
Way to read the article, folks. "And at its labs in China, the company is developing an embedded operating system and emulation software that would run PC and Sony Playstation games on the RISC chip"
This is a general-purpose, 300MHz RISC chip. There is no way in hell it could emulate the special-purpose, 300MHz SIMD-based Emotion Engine, nor its dual-channel RDRAM memory.
Nor is there any reason why ALi would ever want to do so. All gaming consoles, including the PS2, are sold at a loss. The only reason Sony wants to sell PS2s is that they make money on licensing fees for each game sold. The console is a loss leader. There is no way to make a PS2 for less than $300; if there was, Sony would sell them for less than $300. Therefore it is a very simple conclusion that this chip emulates the PS1--simple enough to be emulated without incurring additional cost--not the state-of-the-art PS2.
I meant, from the launch of the P4 until...until whenever. For the next month-and-a-bit, AMD most definitely has the highest-performing x86 chip around, bar none--as it arguably has for the last 11 months or so. (Intel's highest speed grades have been available in such laughable quantities that only the benchmarkers were the only ones to get a hold of them; thus it's arguable whether they aught to count any more than the 1.13 does now--i.e. not at all.)
First off, apologies for slipping in ad hominem attacks in my post. However, this was just in response to your similarly inappropriate attacks on Johan and Ace's. The difference, of course, is that my comments were in support of the correct analysis, not disparaging it.
This is not a site on which every little review and rant is meant to be posted. Hannibal's article belonged here because, well, Hannibal is an expert on the technology behind microprocessors.
1) It was Hemos' decision to post this; anyone can submit anything they deem worthwhile.
2) This was neither review or rant, but rather a lengthy and insightful look at some subtle but very important issues that will influence P4 vs. Mustang performance. Just because you've never seen anything on the web supportive of the P4 doesn't make a balanced piece a rant; it just means that you've been reading a lot of ignorant writing.
3) Humorously enough, the "self-promot ing" Hannibal link I offered was exactly "every little review", this time of some gimmicky portable (but monitor-less) PC. I found it entertaining, and was happy to see it on/., but it was the very definition of a fluff piece--like much of/., now that you mention it.
4) Hannibal is NOT an MPU expert. He himself will acknowlege this, and has in his articles (don't have time to find where). Email him yourself and ask him who is more of an expert, himself or Johan De Gelas, and I am relatively certain he'll say Johan. If not, he will readily admit that Johan is at least his equal and that Ace's is a much more technical site than Ars. And he will most certainly admit that Paul DeMone is 10 times the expert he is. Again, I really really like Hannibal's work, it honestly inspired me, and I submit every new Hannibal-on-architecture article to/. But he is just a student, not an expert.
And, just FYI, I have read every single article on microproccessor design that has passed by the/. pages for two years, plus linkage from several other sites, and a few print articles (though I no longer like to touch paper. How primitive...). I could easily look up links, hell, just by using the search features on/. and Anand.
ROFLMAO!
You read the scant handful of poorly chosen architecture articles linked from slashdot and you consider yourself an expert??? HAHAHAHAHAHA. Oh--and sometimes you check your facts with little old 16-year old Anand.
Look dude, it isn't my place to criticize you for not knowing as much about MPU design as I do. It is my place to criticize you for not realizing that there is much more to be known, for not realizing that many people do know more about it than you. I am certainly no expert--I'm just a college student--but it is blindingly clear that I know more about it than you, just as it is clear to most/.ers that they know more about computers than, say, the guy who says he needs to go out and buy more "RAMs" because the new game he just bought says it requires 250MB of free space to install. (Don't worry--you're not that bad, it was just an analogy.;)
Second it's quite clear that you essentially skipped all the parts of the article you didn't understand and concluded that if you--with your expert education on MPU design from/. and Anandtech--didn't know what was going on, it must be "IYNSHO, fluff masquerading as technical writing". Unfortunately, your opinion, humble or no, does not apply here: it is indeed a fact that this piece contained several new insights, and synthesized information which was not easily available in other forms. This may not meet your standards of being "more than a book report," but it is certainly meets those of technical writing. Obviously Johan could not hope to benchmark the new P4 or Mustang cores, as they are not released yet; still he managed to include some insightful benchmarks which demonstrate the points he was abstractly discussing with ample clarity. (Of course, if you're used to looking at MPUs as mysterious black boxes, then you might wonder what rehashed K6 benchmarks are doing in a Mustang/P4 article.) If you truly believe that this article included "nothing unique", why don't you post just one article detailing the issues I raised in my previous post? Since you've obviously read such an article yourself, MPU expert that you are, it shouldn't be too difficult to dig up a link, even without resorting to "the search features on Anand." (LOL!)
No, Johan didn't take what might be called the "Hannibal route"--i.e. launch into an exploration of the overall design philosophies behind the two cores--because he is writing for a specific audience, a knowledgeable technical audience who can be expected to have read several pieces explaining the important design features of the P4 (not much concrete is known about the Mustang other than that it will be a K7 with tweaked layout to improve critical path and power consumption, and that it may recieve several other enhancements as speculated in the article), specifically those here, here, here, here, and here. Not only have most regular readers of Ace's read all these articles, but they have followed some very interesting debates on them between industry experts on the Ace's tech forum for months now. It might be fair to criticize Johan for submitting an article which clearly assumed such a technical background to/. (although in fairness he includes a link to his earlier, more general P4 article in the very first sentence); of course, it's/. who decides what to post on their own site, not Johan.
Re: preproduction benchmarks, the Tom's piece on the PII and the Firingsquad piece on the K7 were generally the only benchmarks available of the respective chips before their launch. If you followed MPU design news as closely as I do you would know this. There is a thing called an NDA, after all; as these two pieces demonstrate, both Intel and AMD like to make sure that those who choose to break theirs post erroneous information.
You're of course right that a PPro was indeed superior to a PII at a given clock speed; if you look through the article itself instead of just relying on the concluding quotes I posted, you would find benchmarks which clearly understate the known performance of the PII by as much as 30 or 40%, though. There is no doubt Tom's preproduction benchmarks, like Firingsquad's, were horribly off. And as long as you're disputing my "always" contention--I've ponied up the links (and no, it didn't take me very long at all, because I, having followed MPU developments for a couple years, knew for example that it was FS with the bad preproduction K7 benchmarks, and Tom with the PII controversy); why don't you post a single pre-NDA "review" or even just a series of leaked benchmarks on a new x86 core which proved entirely accurate?
Re: definition of a 7th-gen core: You really should pay more attention in class, boy, because I'm schooling you right now. I explained what I meant right after that comment in the original post: The Athlon is a new core designed to scale well to very high clockspeeds. Just like the Willammette. That's why the Willamette performs slower clock-for-clock than a theoreticl P!!! at the same clockspeed
First off, there is no evidence that the P4 has lower IPC than the P3, except for preproduction benchmarks and some ambiguous comments from Intel VPs. If you read my previous post at all, you would realize these would tend to indicate that the P4 actually has higher IPC, not lower. On the other hand, the main evidence that it has higher IPC is that an analysis of all the new, innovative, braniac features of the core strongly indicate that it must.
And second off, you couldn't be more wrong. By calling the K7 a "7th-gen core" you are obviously comparing it to the 6 previous generations of Intel cores. Each of them was able to improve both clock speed on identical process and IPC significantly over the previous generation. The Athlon beats the P6 in clock speed on identical process...but only narrowly: the Athlon sweet spot right now is around 1 GHz on Dresden's.18 Cu process; the P6 around 750 MHz on a.18 Al process. Intel's process is probably slightly better except for the large Cu vs. Al gap, so we can be charitable and say that, on identical processes, the K7 clocks 25% faster in an untweaked core than the P6 does in a much-tweaked core. Indications are that the Mustang/Palomino/Morgan K7 tweak will reach 1.5 GHz on.18 Cu, so perhaps 35-40% better on equivalent processes. As for IPC, the P6 and K7 are essentially equal. Indeed, this is being generous to the K7, as the P6 knocks it all over the place in the fairest cross-platform bench there is, SPEC. Yes, this is because Intel's in-house compiler group is better than AMD's...but the compiler is arguably just as much a part of a core as the silicon itself.
Meanwhile, P4 roadmaps indicate that it will scale 100% better than the P6 on identical processes, and the analysis of Paul DeMone, a far greater MPU expert than you or I could hope to be, is that it will have 15-20% better IPC for integer work, and considerably greater gains for FP. (It's too soon to tell without knowing more about how well compilers will optimize for SSE2.) That would be a 7th-gen core worthy of the leap from 5th to 6th which the P6 provided.
Again, don't get me wrong: the Athlon is clearly clearly superior to the Coppermine P3. But only by about the same degree as the Coppermine P3 was superior to the Katmai P3. That is, *not* by a full "generation"--whatever the hell that is.
Re: important innovations in the K7: It's a huge innovation in the x86 world, something Intel hasn't done since the PPro days. First of all, the EV6 bus is new to x86 and a huge innovation, it is superior in every way to the old Intel GTL+.
BWAA HAA HAA HAA HAA HAA!! Man, I'm rolling on the floor and crying that's so pitiful.
Oh, but I'm being rude. Ahem. Pardon me. You, uh...you do know why it's called the EV6 bus, don't you?...Even though that name happens to be shared by the current generation of Compaq Alpha MPUs... Or wait; actually the official name of the current Alphas chips is 21264; it's just that they like to code name their core variants things like "EV6" and "EV67" and the upcoming "EV68". Based on what, again? What's that? Based on the code name of the current Alpha platform???
HAA HAA HAA HAA HAA! I asked you to pick one innovative feature of the K7, and you picked the one feature that AMD DIDN'T INVENT!!!!!
Ok, I'm over it now. Phew.
Right. AMD didn't invent the EV6 bus. They didn't help develop it. They in fact had nothing to do with it. They licensed it, wholesale, lock, stock, barrel, from Compaq where it has been in use for quite some time now. On the one hand, it was a good business decision because Intel had just clamped down and decided not to relicense the P6 bus (not really called GTL+ BTW, but don't worry it's a very common mistake) to AMD, and rather than take the time to reinvent the wheel (and thus delay the launch of the K7), AMD decided to go shopping at Compaq. Fine. Smart decision.
Don't give me any of this revisionist history that they did it because it's 200 MHz, though. The K7's extra FSB bandwidth (courtesy of the EV6 bus (and the engineers at Compaq, not AMD)) has up till now been entirely wasted as it is paired with SDR SDRAM (1.6 GB/s FSB, only 1.06 GB/s from DRAM)--generally paired asynchronously with PC133. If it were any help at all, don't you think the Athlon would be winning and not losing in FSB-intensive benchmarks like Q3? Meanwhile, it's a huge waste of pins and power--as well it should be, since it was originally designed for $10,000-50,000 workstations and servers, which, frankly, can afford the extra mobo costs, power supplies and electric bills.
Now, of course this extra FSB bandwidth will finally be put to good use with the advent of DDR mobos for the K7, *finally* starting early next month (fingers crossed!); latest news is the 1st. DDR mobos for the P3 will show less improvement because the P3 is stuck at 133 MHz FSB. Fine.
But this isn't why AMD chose the EV6. Indeed, when they made that decision, the DDR standard had either barely-just-been or had-not-yet-been determined by JEDEC. Intel was set to steamroll RDRAM into every PC, and there was little to no indictation that DDR would ever be a volume part in the PC industry. (It'd be used in servers and such.) AMD chose the EV6 because they *had* to, not because they wanted to. It's a great bus when doing what it's designed for--connecting specially made (quite expensive) double-wide SDRAM to Alphas, at FSB freqencies up to 466 MHz. But it offers little to no performance benefit in the here and now for the K7. And as for DDR and high-speed buses, Intel will be releasing their Tualatin revision P3's in Q2 with a 200 MHz FSB, in time for Almador, their (maybe--legal issues with Rambus...) DDR P3 chipset. So yes, the K7 will be first with decent DDR support in the x86 space. The P4's dual-RDRAM chipset and 3.2 GB/s FSB will be faster, though more expensive, as far as memory performance goes, though.
But calling a bus that AMD had exactly zero nada zilch nothing to do with evidence of their design innovation gets an extra HAHAHAHAHAHA from me.
More on AMD's innovative EV6 bus: it's a huge boon for multiprocessing, with the 760MP coming to retail very soon according to Anand.
Unfortunately, according to AMD's Q3 earnings report Investor Conference Call 2 weeks ago (I was listening; somehow I doubt you were...), the 760 MP has been delayed to at least Q1, possibly Q2. They played it off as strategic reasons (business demand down; no major deals with the big 4 server OEMs (Dell, IBM, HP, Compaq) for AMD in the enterprise lines), but considering they only had one 2-way system--behind closed doors and not running anything--at MPF it looks as if their engineering is behind too. On the one hand, too bad, because point-to-point beats shared bus any day. On the other, there's a reason why Intel went with shared bus, and it's not because they'd never heard of PTP. It's, well, easier to implement. When doing the right thing takes over a year longer, it sometimes becomes doing the wrong thing. (Not that I believe that's true here, but it's worth taking into consideration.)
Now, the countryside is littered with Athlon clusters crunching numbers for the scientific community in places where they'd never have considered using a P!!!.
First off, scientific computing is such a niche market as to have absolutely negligable impact on the bottom line of either company. The idea that AMD designed the K7's huge-ass FPU--thus taking up vital die-space--for the lucrative physicist market is laughable. It's an unbalanced design, plain and simple. Second, last time I checked, most scientific computing was being done either on Alphas or on Beowulf's of Celerons. Now, I don't doubt that K7's are moving heavily into the mix; if I was doing scientific computing, I would go with a cluster of Durons in a heartbeat.
But do you really, honestly, think that when AMD decided to go with the 3-wide FPU there were dreams of meteorology and electron potential modeling spinning in their heads? Me either.
And your "analysis" of the supposed advantages of the Coppermine's cache over the Thunderbird's are positively laughable. You see, in the REAL WORLD people don't run benchmarks on their boxes all day. They run apps and processes, usually several at a time. That's why the Athlon's cache is superior--you can keep more in it instead of swapping to system RAM, which is a MUCH BIGGER HIT than having a small amount of cache latency. In the REAL WORLD, the Athlon's cache architecture makes sense, not in your fantasy where we all run CPUmark all day.
Uhhuh. That's why the Katmai P3--with its half-speed 512 Kb L2--was so much faster than the Coppermine? That's why the Athlon "Classic"--with down to 1/3-speed 512Kb L2--is so much better than TBird??
You think it's faster to perform a context switch with a 64-bit bus to L2 than a 256-bit one? Golly, imagine how slow the P4 with its 48 GB/s bus to the 5-cycle latency L2 will be!!
Furthermore, in case you'd forgotten, all these chips operate at over one billion cycles per second. Multitasking occurs at much higher granularity than this, and even if your analysis were right (it's not), the effects of multitasking are invisible to a chip to a second or third order of approximation. The effects of a 7 (or 5!) cycle L2 vs. an 11 cycle one most certainly are not.
And then you go on to say that the K8 will just be a derivative of the K7. You really aren't paying attention at all. The K8 Hammer architecture is completely new, not only extending x86 to true 64-bit while retaining backwards-compatibility with 32-bit and 16-bit code, but adding huge and significant architectural innovations. Go read about it, dimwit, before you guess at what it is. Lots of documentation has been released--even just a quick scan of some Slashdot search results will make you a lot more knowledgeable about it than you are now. Geez...
No, this is false. It is by now quite well known that what will define the Hammer family will be just a simple extension of the x86 ISA to 64 bits--an extension which will have essentially no use for the average PC user, but rather only for those who need 64-bit integer precision (CAD, etc.) or >32-bit memory address space (database, etc.). In addition, "the K8"--that is, the Sledgehammer, aimed at the enterprise market--will feature 2-way CMP and AMD's new Southbridge standard, LDT. Ho-hum. Nice features (LDT has no place in the PC either, though), but nothing extraordinary, especially considering it's not due until early 2002. In addition, there has been mention (Sanders mentioned it in an interview) of another K8 variant called Clawhammer; speculation is that this is a PC version of the K8, although it's not known what, other than x86-64, will differentiate it from, say, Palomino.
If you actually believe the K8 represents an entirely new design, then it may be that your news was correct but just a little (about 12 months) late. The K8 was indeed scheduled to be a ground-up clean-sheet kick-butt design, but was radically scaled back by Sanders less than a year ago. Head Designer and impressive guy Atiq Raza quit around a year ago, following this decision, and the hopes of a truly innovative K8 went with him. Of course, evolution is often better than revolution in the MPU industry--eg. RDRAM. If it can manage to position itself against Itanium, the K8 might look very strong. (Of course, McKinley will be on the way by then, and it's considerably less of a joke.) But claiming that the K8 is a revolutionary new design is plain false.
As for the recent/. articles on the K8, they have all been, IIRC, about the recently released x86-64 simulator to help Linux, etc. port to the new ISA. This has nothing whatsoever to do with the design of the K8 itself--an x86 simulator could help "port" Linux to either a P5 or a P6, although they could not be more different architecturally--just the ISA.
Re: P4 moving into heavy volume in Q3 2001: God, you are a shameless, and dim-witted, Intel apologist, just as I suspected. Q3 2001?
How precisely does this make me an Intel apologist? Is it too early?? It is a known fact that Intel's roadmap moves the P4 solidly into the mainstream category in Q3 '01 with the introduction of the Northwood P4 on a.13 Cu process. Northwood will allow Intel to get good yields at >2.0 GHz, and, more importantly, takes up a much more reasonable die space for mass production. Just as important, its release will coincide with the release of the (hopefully DDR) SDRAM Brookdale chipset, which ought to move the P4 out of the quite-high end where it will be stuck with the dual-RDRAM Tehama chipset. (3rd party DDR chipsets may be out for the P4 before then, but probably not in much volume before Q2 at the earliest.)
Is it too late?? The latest Intel roadmap shows the P4 moving to the upper end of the mainstream category in Q2, but I believe that to be a lie by Intel marketing, eager to cover up the fact that they essentially have no upper-mainstream product from now until Q3 2001, a hole in their product line a mile wide. (Am I still an Intel apologist?) Indeed, this is the reason I just bought AMD stock very recently, and have been encouraging my INTC-owning relatives to sell ever since, well as it turned out, just before the peak late this summer. And yes, like you too I am generally appalled by Intel's heavy-handed anti-consumer tactics--suing VIA and refusing to release a PC133 chipset in a lame attempt to force RDRAM down the industry's throat; paper launching the 1 GHz P3 6 months before even limited volume was available, the 700-850 MHz P3's before it around 3 months early, and the 1.13...oh the 1.13...all in a lame attempt to pretend the P6 could keep up with the Athlon; bribing Michael Dell with special pricing and all several dozen GHz P3s available this summer to spread libelous statements to the media in a lame attempt to disparage AMD's products; spreading IA-64 FUD in a lame though successful attempt to scare designers of competing RISC chips to delay (Compaq, HP) or eliminate (MIPS) their next-gen chips; keeping the Celeron FSB clocked at 66 MHz and "single-processor only" in a lame attempt to...be lame.
Don't worry, I dislike Intel plenty a lot. I cheer for AMD, and make no bones about it.
What bothers me, though, is that, having been on the Athlon bandwagon since summer 1999, when I first read analyses of how the K6's poor scaling was due to architecture not process quality, and how the better balanced K7 had the chance to scale even higher than the P3, I've seen how this position has gone from being contrarian, well-informed and far-sighted to the position of a growing mainstream of ill-informed buzzword-spouting reality-ignoring AMD fanboys. No, not you; the people I'm talking about are much much worse (and hence not nearly as able to fool/. with uninformed arguments). What's even worse, though, is that several influential tech sites employ writers not much more knowledgeable than you, and they spout the same pro-AMD propaganda day after day after day. It's not that I dislike seeing anything pro-AMD or anti-Intel; indeed, exactly the contrary. It's just that I like it to be true.
Plus, AMD's execution with the K7, while quite good, has been well short of the claims that I and many others were making for it over the past year. The benchmarks have been disappointing. There's only so much excitement you can get out of awesome benches in 3DSMax and ViewPerf before you notice those Q3 and Content Creation scores just aren't going to change. (Yes, I know CC is Intel-biased. Whatever.) Thunderbird in particular was a huge disappointment, offering gains on the order of 3-5% over Athlon Classic while the Coppermine P3 beat Katmai by 10-15% (it's that 64-bit vs. 256 bit L2 bus). MP has been MIA for months now. The K7 laptops are late as well; high power-consumption is the price you pay for unneeded FPUs.
Having read the Willamette articles I've now referred you to twice (the DeMone ones on RWT), having seen Paul defend his unorthodox position on the Ace's tech boards for months now, basically skewering even very well-informed arguments on the AMD side, I've gradually become convinced that the "web hardware community" is greatly underestimating the P4's performance. So have many people much more knowledgeable than me--including the formerly (and still, though less so, IMO) AMD-biased Johan.
I usually go around looking to argue with P4-bashers who seem intelligent and well-versed in the technology, because they give the most interesting arguments and are the most willing to learn. Unfortunately, I too often have to correct well-meaning but misleading posters like yourself, who ignorantly pass on the same-old wishful thinking and oversimple analysis as fact.
I like AMD. I really do. I want them to "win", inasmuch as I want anyone to. I really do want them to stay very very competitive, like they are now. (And to make me lots of money!) But I just don't think it's helping them, or helping the truth, to pretend that the K7, a largely derivative design, will be able to keep up with the radically innovative P4 for very long. And I don't think it's furthering the principles of beauty and elegence in design--which is what really interests me in this stuff anyways--to call an insightful and fair (I thought it strongly gave AMD the benefit of the doubt, BTW) analysis of the strengths and weaknesses of the P4 and Mustang designs "an ignorant fluffy rant", or whatever you said.
I won't expect the apology from you, but you have my email address if you should want to send it. Meanwhile, if you're really interested in MPU design, please read Paul's articles at RWT; they're fabulous and take everything to a whole new level. And if they must be anti-Intel, you can't do better than his Merced/Itanium articles, here, here, here and here.
Also you should check out the tech forum at Ace's, and the very AMD-biased but usually literate and often a great site for news and links...JC's. Plus the usual suspects: Tom's, Ars, The Register for juicy-and-occasionally-even-true rumors. You could learn a lot, and trust me, it's fascinating stuff.
Tell me, what is so innovative about _kludging_ old crap to high speeds and adding more superfluous, useless instructions? I say nothing. Innovation is by definition "something new or different".
That's like looking at a new race car and saying it's no better than a Model T because it's just "kludging old crap to high speeds". Now, if it had 5 wheels, then you'd be getting somewhere! That's innovation!!
Or like saying any new computer is still the same old crap kludged to high speed because it's all binary. If only we switched to ternary logic, that'd be innovation!! Or because all architectures these days use 8-bit bytes. What's wrong with 10, or 37? The only reason we use 8-bit bytes is for backwards-compatability with extended ASCII, which is obsolete anyways! Why not move to 16-bit bytes with Unicode?! That'd be innovation! That's new!
Geez. It's not like the architects at Intel (or AMD, or Compaq, or anywhere for that matter) couldn't think up a new ISA in their sleep. It's just a dumb idea.
The Merced might've been an impressive chip, but why the f*sck do the still have keep dragging that 8086 shit behind?
Because the point of a MPU is to run programs, not to look beautiful on paper. The vast majority of the world's programs run only on x86; thus the vast majority of processor marketshare is going to be x86-compatible. Besides, if you actually knew and understood the amazing ways Intel (especially) engineers have managed to squeeze out performance despite working around the design constraints of x86-compatability, you wouldn't run around calling an amazing core like the P6 a "kludge". Indeed, you obviously don't realize this, but the most important ingredient in getting a chip to high clock speeds is an elegant balanced design. (Manufacturing process is a close 2nd, and is why the Alpha, a more elegant design, is outclocked by the PIII and Athlon.)
I think we should go more into parallelism in programs and take the advantage of multiple, perhaps a bit slower, processors, not one huge frying pan.
Well that's wonderful that you think that. Unfortunately, instruction-level parallelism is very very difficult to extract from most computer code. Furthermore, the increased complexity of SMP buses and SMP motherboards makes the SMP option too expensive to hit the mainstream market for the forseeable future.
Instead what we'll see is CMP--chip level multiprocessing, essentially having multiple chips on one die. Examples include the IBM POWER4 and prolly the AMD K8. This increases chip-to-chip bandwidth and gets rid of the motherboard costs, but doesn't solve the problem of low ILP in most code. A more interesting solution is SMT, simultaneous multithreading, which allows a single superscalar core to work on instructions from several different threads in parallel. Early indications are that SMT provides phenomenal performance boosts; Sun's embedded MAJC chips use SMT, but the first general purpose SMT MPU might be the Compaq Alpha EV8.
Ugh. Ignorant crap getting a +4 insightful. Well, let's get this over with...
Rather, it is a piece of self-promotion by Ace's Hardware, who sent this story in themselves.
Many websites send notices of their original content to each other, especially when they know that it is excellent content, like this article. ArsTechnica sends notices both to Ace's and to/. Here is an example of exactly the same brand of "self-promotion" from Hannibal, and as regarding a (IMO) far less worthy though still interesting article.
The article itself doesn't say anything the knowledgeable don't already know.
This is false. I am a hell of a lot more knowledgeable in matters of MPU architecture than you, and I learned quite a bit. But I suppose you were already an expert on the intricacies of load-store reordering on the P6 vs. the K7, on the precise weaknesses of the K7's branch prediction algorithm (i.e. that it throws an exception and flushes its BTB when presented with more than two branches in a 16-byte aligned code window), on the dependancy scheduling problems of very large instruction reorder buffers and what they imply about the P4's clock-speed ramp. I suppose you'd already seen benchmarks which measured the effects of L2 latency and branch prediction on IPC. (You wouldn't mind posting a link, would you troll?)
In fact, it reads like a high-school report, and not even a very well-written one. E.g., "First we will try to analyze the most important shortcomings, next we will search for possible solutions." Sounds just like the simplistic expositions of a high school term paper.
Way to go, asshole. The author's name is Johan De Gelas. He lives in the Netherlands. ENGLISH IS NOT HIS NATIVE LANGUAGE. I'd like to see you post a single sentence in Danish, much less an incredibly insightful article on competing philosophies in next-generation 1.5 GHz+ MPU design.
Look, I know that there is a lot of mumbo-jumbo laden "technical" architecture discussion going around the web, often quite nonsensical and written by good-old fashioned Americans who just haven't had the benefit of 8th grade grammar (or a solid education in MPU design). The point is, you were horribly wrong to lump this article in with that schlock, and you apparently did so only because it contained terms and explanations which you didn't understand. Furthermore, you made your point, with quite authoritative tone, in a public forum. Of course you have every right to be loud and wrong in/. Indeed, I've been known to be loud and wrong in/. several times before. Still, if you don't know what you're talking about, please please please don't talk.
I repeat: the article is not a technical piece at all. Hannibal at ArsTechnica writes technical pieces about CPU design. This article at Ace's Hardware says nothing insightful.
Completely backwards. Now, let me first say that I not only respect Hannibal tremendously, but that his articles (particularly the excellent RISC vs. CISC in the Post-RISC era) were what inspired me, a bit over a year ago, to begin to learn much more about MPU architecture and design. They are written very vividly, with strong prose and excellent, clear analogies. They do a fabulous job of explaining complicated concepts and new trends in MPU design to a lay reader.
ArsTechnica, like/., is a general-purpose tech site. Ace's Hardware is all about hardware, mainly MPU design and architecture. Indeed, it is perhaps the most respected daily-updated MPU architecture site on the web. Several experts--many very well informed amateurs, many who work in the industry--post in their technical forum. We're talking people like Aaron Spink, MPU designer for Compaq, who works on what is generally acknowledged to be the best MPU design team on the planet (the Alpha). We're also talking people like Paul DeMone, designer for MOSAID, who in his free time writes IMNSHO the best technical series of design articles available for free, including this excellent article which destroyed one of Hannibal's fundamental premises in that Post-RISC article I loved so much. And indeed, Hannibal immediately posted a link to the article and said as much. That's because, as great a service as he provides--and I really, really love Hannibal's articles and they're the first thing I recommend to anyone interested in learning about MPU design--they are *not* technical, they often miss important points which an experienced professional would not (as in this case), and Hannibal is just a student with the benefit of a few architecture classes and a well-worn copy of Hennessy and Patterson.
So by all means, people--if you're reading this and want to learn about the fascinating world of MPU design, start with Hannibal. But just know that his articles, while very good, are *not* technical; when you want technical, a great place to start is Ace's.
Now that we're through with that bit of unpleasantness, let's clean up your misstatements, shall we?
In fact, it misses the point. It dares to call the P4 "innovative" and wonder whether future designs in the x86 world will copy it. Well, of course not! How many times must it be said that the P4 barely keeps up with the Athlon and performs less well than a P!!!? Because, that is a fact. Numerous production samples have leaked, with the test results uniformly and without exception pointing to the fact that even if the platform's performance is improved by release time--which it should, since these are samples not a retail product--it won't outperform a P!!! with equal clockspeed. That's why the P4 is being released at 1.4 and 1.5GHz initially, because if they were released at 1.2GHz they'd be outperformed by the 1GHz P!!! and that wouldn't be good.
Oh really. Just like preproduction benchmarks of the K7 proved it to be "closer to that of a Celeron 366 than any Pentium III." Just like preproduction benchmarks of the PII lead to the following insightful comments from Tom's Hardware (a leader in the "P4 is overhyped, clock-speed isn't everything, blah blah blah" ignorance these days...):
Well, the beef with the Pentium II is that it seems to suffer from BSE (bovine spongiform encephelephy a.k.a. Mad Cow Disease), although I doubt that any British cattle was involved. Although BSE infected products shouldn't be imported, I'm pretty sure we'll also see the Pentium II here in Europe soon after the 3rd of May when it is finally released. However, since I wouldn't eat BSE infected beef, I wouldn't be interested in risking an infection of my computer with this CPU either.
...For former Pentium users there's hardly any attractiveness in the Pentium II either. The Windows 95 performance is hardly any better and in some cases even worse than the cheaper Pentium Pro or Pentium MMX. Windows NT users would be the last ones to be interested in the Pentium II, there is just no reason at all to swap the Pentium Pro for a Pentium II.
Guess what: preproduction benchmarks are always wrong. Again, preproduction benchmarks are always wrong. And in particular, the benchmarks we've seen on those preproduction P4's are--just like the benchmarks included in the articles above (i.e. the K7 scoring only 60% of a clock-normalized PIII on FPUMark; the PII doing worse on 32-bit code than a P5-MMX)--utter nonsense given what we know about the P4'sdesign . Thus the logical conclusion is that, just like the preproduction MPU's "benchmarked" above (and let me remind you that those were at least close enough to final silicon to be clocked at release-ready clock speeds), the P4's we have seen "benchmarked" on the web so far have been sandbagged.
Now, the common reaction to these charges goes something like this: "Sandbagged? Impossible! After all, these P4's are at most one stepping from final silicon, maybe even final silicon! Thus they can't be sandbagged!" Which is utterly false. Obviously the sandbagging isn't done in the chip design--that would be idiotic. Rather, it is done in microcode. Every feature of the chip can be turned on and off, tuned and detuned, in microcode. Thus it is trivial to ship a preproduction MPU off for validation with, for example, part of the L2 cache disabled, or the BTB or instruction reorder buffers set to flush when they don't need to, or the way prediction on the two-cycle L1 cache turned off, or tuned wrong, or with certain x86 instructions mapped to unnecessarily slow circuit paths, or any of dozens and dozens of different things set wrong. Indeed, this is the common state of internal preproduction MPUs, because the only way to test corner cases and pathological cases is by disabling one part of the chip and thus placing unrealistic stress on another. In other words, preproduction chips are sort of like beta software--full of DEBUG code which slows everything down, but isn't worth taking out until you're sure everything works.
"But," you may say, "why would Intel sandbag their preproduction P4's when they know benchmarks will leak out?? Why not build up the hype and all that??" The answer, again, is simple. If you take a look at Intel's history of dealing with prerelease cores, you find that they only hype the projects which are likely to underperform horribly--the i860, the iAPX432, Itanium--and they significantly underplay the ones which are going to kick major booty--eg. the P6 core and now the P4. "But why???" Easy. If Intel has a project which sucks, the best they can hope for is to scare off their potential competitors from the market space until they can get another crack at it. (Remember, there's a 3-or-more year lag-time between the decision to start--or not start--a project and the finished product.) That's exactly what they've done with Itanium, scaring MIPS out of the high-end RISC business, and putting Compaq and HP years behind on their high-end RISC designs, with nothing but a bunch of IA-64 FUD. Meanwhile, if their upcoming core is going to perform incredibly, why waste time hyping and giving your competitors the tip-off?? All that would do is cannibalize the sales of your current MPUs as people wait to get the amazing new chip due out in 6 months. Worse, if Intel hyped the great performance of the upcoming P4, they would need to admit that the average PC user can actually use 1 GHz+ performance...which, of course, would play right into the hands of AMD which is the only player with decent 1GHz+ volume until well into next year. This way, you get to surprise the industry, get great press, and sell off way more of your old, now obsolete chips. Simple, really.
Now, the P4 barely keeps up with the current-generation Athlon Thunderbirds. This is important to note because people always *blamed* AMD for a processor which still, with the advantages of the P!!! SIMD intruction optimizations used in much software, didn't quite keep pace with Intel's offering in the most common benchmarks. Now, the technically knowledgeable know that the Athlon whomps the P!!! in anything that isn't SIMDified, and that its floating point unit is head-and-shoulders above. But people still moaned about the performance gap in certain common SIMDified benchmarks.
Wrong, wrong, wrong. The only cases in which the Athlon clearly bests a Coppermine P3 is in scientific (i.e. double-precision) FPU-heavy simulations, ray tracing, etc. On almost every other benchmark, they are within +/-5% at identical clock speeds, with a few standouts at around +/-8% for each architecture. In particular, 3D games tend to show an affinity for the Coppermine. Blaming this on some "SIMD bogeyman" is ridiculous--every 3D game, and especially a standout game like Quake 3, is optimized for 3DNow just as it is for SSE. Now, you can either deny the facts, or you can try to understand them.
The main culprit, of course, is the difference in L2 latencies. Tbird has a 64-bit bus to L2 at a latency of 11 clock cycles, with 384Kb total cache; Coppermine has a 256-bit bus to L2 at a latency of 7 clock cycles with 256Kb total cache. The Tbird has the bigger cache because the cache design is exclusive; however, it also has much longer latencies for this and other reasons. In the end, there is no comparison as to which is the better design--the Coppermine's cache hierarchy is simply better than the TBird's, no argument about it. And Johan's benchmarks illustrate this rather nicely.
Well, here's what they didn't realize: the Athlon is a truly seventh-generation core--which beat Intel to the punch by, what, almost a year and a half? As such, it has made trade-offs to be able to scale to higher clockspeeds better--one reason why Intel had to recall, and still hasn't re-issued, the 1.13GHz P!!! yet AMD are easily churning out 1.2GHz Athlon Thunderbirds.
"The Athlon is truly a seventh-generation core." What does that mean??? If you think it means the K7 core has one single architectural innovation which does not exist on an MPU available before it, then I challenge you to list it now. (Indeed, I can't think of a single innovation in the K7 which isn't in the P6 core--except for the exclusive cache architecture, which is an overall weakness compared to the Coppermine cache--but there may be some.) If you think it means the K7 is a better core than the P6, well, you're right. The K7 is indeed a better core, in that its pipeline stages are more evenly balanced, and thus it can scale to higher clockspeeds on similar process. On the other hand, the K7 is less well balanced from an execution resources standpoint, including such oafish features as a fully 3-wide FPU (as opposed to the P6's 1.5-wide FPU), which offers at best 40% better performance, but generally no better performance than the P6 on FP intensive apps. Yes, the reason for the discrepancy is partly due to code which is compiled with the P6's execution resources in mind--but of course, that will continue to be most things so long as Intel has the majority of market share (AMD currently sells out all the MPUs it can make and thus has no theoretical way of getting majority market share for at least the next 4 years or so), and most apps are precompiled binary. But it's partly due to the fact that there's just not enough need for 3 full FPUs to justify the die space they take. This is just one example, but the end result is that the K7 is a well-balanced core pipeline-wise which is larger and consumes more power than it can justify based on its ability to get instructions from cache and memory. It is still the fastest thing out there, but it uses brute force to make it there. Time-to-market issues are behind some of these design issues, and some of those will be solved with the upcoming Mustang/Palomino/Morgan core tweak. But that still won't make the K7 anything more than a rebalanced tweaked-out brute-force of a P6. And hey--that ain't bad. But it ain't innovation.
The P4, on the other hand, includes many features never before seen on a commercial MPU. They include: double-pumped ALU, integer decoder and scheduler, and integer retiring (running at up to 4 GHz on a.18 process!!!); trace cache; two-cycle L1 potentially using way-prediction to reach 2.0 GHz on a.18 process; hardware prefetch; and, well, a pipeline deep enough to allow 2.0 GHz on a.18 process. It also includes some impressive resources never before seen on the x86 side of things. They include: 126 op buffer; 3.2 GB/s-4.27 Gb/s FSB; "most accurate branch prediction algorithm ever" (claimed by Intel at MPF a couple weeks ago); 48 GB/s L2->core bandwidth; and SSE2, which will finally let the x86 push double-precision FP code with the big boys, and doesn't resort to a kludgy, die-space-wasting, gas-guzzling halfway-solution like the K7's triple FPU. On the downside there is the branch misprediction penalty of 19 clocks, potentially 27 if the code is not in the trace cache (unlikely). However, even this is mitigated by the fact that while the official branch mispredict penalty of the P6, for example, is a mere 12 clocks IIRC, the actual time to execute new code on a mispredict is more in the neighborhood of 30-50 clocks, because the instructions need to be rescheduled. Meanwhile, the P4 has wider scheduling resources, and thus may not even have a higher branch mispredict penalty in practice at all. It will certainly have many fewer mispredicts, so the overall analysis here is probably a wash.
It is, all-in-all, a very impressive looking chip, more than worthy of the title "seventh generation", whether it turns out to perform well or poorly. However, meaningless sandbagged benchmarks aside, all indications are that it will perform magnificantly. Taken as a whole, the P4 contains not only the sorts of design changes necessary to *double* clock speed on a given process over the P6 (note:WOW), but also *increase* IPC. But we'll see how this beautiful looking design translates to reality when the first actual P4's are released and benchmarked.
Blah blah blah, biased statements towards Ace's.
Ace's is in general a slightly AMD-biased site. "Unfortunately", Johan, Brian, and the rest of the crew there "have to" read the thoughts of actual MPU experts day in and day out in their technical forum, and thus know that the case for the K7--and against the P4--is not what the average hardware site has made it out to be. This is not to take anything away from AMD, which has at the moment by far and away the fastest performing MPUs on the planet, the best binsplits on the planet, and about 1.4x the performance/price of Intel all the way up and down their price lists. However, all appearances are that, once the P4 moves into heavy volume production (note: not until Q3 next year at the earliest, after a process shrink to.13 Cu), Intel will have a very strong and competitive lineup. And that until then, while AMD ought to be the choice of every sane computer buyer around, Intel will have bragging rights for the highest-performing (not just highest-clocking) chip in the x86 space, if not in the world. Furthermore, with the K8 almost certain to be just a derivative of the K7 (probably with 64-bit extensions and 2-way CMP), it looks as if Intel will take back the clock-speed crown and hold it for good. Whether that means it will win the performance crown for good remains to be seen, but I certainly wouldn't discount the P4 core if I were you.
...But rather encryption to restrict the recipient's ability to access the data after a certain period of time (a week). In truth, it does both very badly.
First it is clear that this cannot be a serious attempt at the "traditional" problem of encryption--for the reason pointed out in many posts (unsecure channel between sender and Yahoo!) as well as a deeper one--this system requires you to give full trust to both Yahoo! and Zixit, as there is no proof whatsoever that they will even bother to encrypt your email when passing it between themselves. (And if you would trust a potentially life-and-death secret to two companies named "Yahoo!" and "Zixit" then you deserve what's coming to you.) Finally, there is a huge problem with verification: the recipient merely needs to "verify" that they actually hold the email address the sender specified. And how, pray tell, do they do that? Likely they instead need only temporary access to that account to recieve a (plaintext??) email giving them a temporary password. Good lord.
Instead it appears to implement an access control restriction--your recipient can only access the email for 7 days before it is gone forever. Of course, this fails for the same reason all access controls fail--the message must finally be displayed in plaintext on an untrusted machine, namely the recipient's. Assuming "Zixit" has implemented some (hackable) fix to the "copy-and-paste attack" (ala the International Lyrics Server), there is still the ever pernicious "screenshot attack". And as always, even if the recipient's machine could somehow be entirely trusted, there is the final undoing of any access control restriction--the digital-to-analog conversion. Just as I can always tape-record the SDMI music coming out of my speakers, and videotape that DVD playing on my TV, this scheme falls rather easily to a pen-and-paper.
Meanwhile, it doesn't even do the trick of "increasing the amount of encrypted emails the FBI has to look through", because all this traffic is presumably just SSL, and there's a whole bunch of that around. Besides, chances are the FBI/CIA/NSA/KGB/alien invaders would rather just install a keyboard sniffer or run a TEMPEST analysis on your computer than have to solve the FACTORIZATION problem or build huge special-purpose number seives and spend several times the lifetime of the universe waiting around to read your email or invent a quantum computer. (Maybe the aliens would rather do the latter.) Or just bring a warrant to Yahoo!/Zixit, who *both* have full plaintext access to your "encrypted" email and will likely be very happy to comply with the FBI. (Or aliens pretending to be the FBI--has no one noticed how unsecure and spoofable search warrants are?)
Um, I think what I'm saying is, this appears pretty lame. The only "useful" thing I can think of that this does is destroy the message if it is not accessed within 7 days. Of course, trusting this means trusting that 1)Zixit actually destroys the message; 2) Yahoo! destroys their copy of it; 3) no one intercepted it when it was passed in plaintext from the sender to Yahoo!; 4) any logs or copies of it as it propogated (in plaintext) across the Internet between the sender and Yahoo! were destroyed; 5) it was actually encrypted between Yahoo! and Zixit...
[If Sony was smary, they would FLOOD the market with cheap PSX2s to get a HUGE installed console base. They then would be able to sell loads of games for years and years]
.18 to a .13, and there were some problems during the transition. They didn't want to pass up the Christmas season, so they released it (in limited quantities)
.25 to .18. Second, they never planned on releasing more than 2.5 million American PS2's by Xmas (now they will be lucky to hit 1 mil., as they are reported to have not only cut the launch numbers from 1 mil. to 500k, but have missed their subsequent shipping targets as well), because they knew they couldn't make more than that.
That was Sony's plan, but IIRC, they switched the chip manufacturing process from a
First, that's
Third, they are already losing money--quite a lot of money--selling them for $300. (I've seen estimates of a loss of $170/console, but they are probably outdated.) Fourth, it's not like this was a brilliant stroke of marketing genius on Sony's part--all consoles are sold at a loss, always and forever; the money is made back (theoretically and then some) by licensing fees on every game sold.
Fifth and finally, a $99 PS2 would compound Sony's worst fear (and biggest miscalculation IMO)--that is, that everyone would buy a PS2 because it was the cheapest DVD player around, and not to play (read: buy) games for it. Then Sony loses money on the initial sale, and doesn't gain it back on license fees for all the PS2 games that no one is buying because they're all using their PS2s as DVD players. Indeed, it appears as if this is what has happened so far in Japan, where the ratio of PS2 games/PS2 consoles sold is abysmally low. (Of course, the initial round of Japanese PS2 games were themselves abysmal; the US ones, while still no better than the Dreamcast, were much improved.)
Right now there is a small selection of PS2 games, so there is little competition between developers, plus releasing a game early means sales throughout the life of the console, not just a few months. Releasing a game early in a popular console's life is very beneficial to the developer
Actually, this is wrong as well. The PS2 launched in America with a remarkable 27 titles; over 50 are expected by Xmas. (Contrast the other extreme, the N64--which launced in September of 1997 with just 2 or 3 games IIRC and had only like 7 by the end of the year, something awful like that.) The problem, of course, is that those 50 titles are going to be split amongst what now appears to be barely 1 million PS2 owners. Assuming a game/system ratio (for the year) of 3 (this may be generous considering everyone has already shelled out $300 for the console, plus more for a memory card or extra controller required to many of the best games), that's an average of 60,000 sales/game. And that, in the console business, is a collassal failure. Meanwhile, because the PS2 presents programmers with a notoriously steep learning curve, most of the initial games are quite unimpressive, so it's doubtful many will become long-lasting classics or boast the public's perception of the developer.
A lot of developers are going to lose big with their initial PS2 games. While there are other forces at work, the general pace of a system's releases is chosen by the system's manufacturer. Many are predicting a lot of developers very unhappy at a Sony which goaded them into glutting the PS2 game market while being unable to fill their end of the bargain by making PS2 consoles. Many developers are already complaining about the PS2's extremely unorthodox insides and complete lack (at this point) of high-level programming libraries. There are several other options out there... So far it looks like a disaster of a launch for Sony; the question is whether the PS2 has enough power, hype and support to rule the market anyways.
What make Transmeta special is that they have put a dynamic binary translator in a chip and have developed silicon to make it faster.
No, actually you have it backwards. Intel (and later NextGen, AMD and I believe Cyrix) put a dynamic ISA translator *on their chips* starting with the P6--they decode (i.e. translate) x86 instructions into internal "u-op" instructions (AMD calls them "macro-ops", same idea) which are used by the rest of the silicon. (This is necessary because x86 instructions are too heterogenous in length and complexity to work well in a deeply-pipelined out-of-order core.)
What Transmeta did was essentially move this translator *off* the chip, into software. The advantage of this is simpler silicon, and therefore lower power consumption. (Also, all things being equal, higher maximum clock speeds; all things are clearly not equal.) A secondary advantage is that far more resources (16MB IIRC) can be devoted to buffering, tracing, analyzing and optimizing the instructions than on a chip, where the physical chip-size keeps buffers small and optimizations simple. The disadvantage is that all this needs to be run on general-purpose (i.e. slower) silicon--and worse, competes for CPU-time with the very programs it is trying to optimize. (Not to mention takes up 16MB of system resources.)
So far the tradeoff has been (IMO) a big loser except in special circumstances--where you need long battery life, x86-compatibility (otherwise there are faster, smaller, more efficient chips out there, like anything in the ARM family), little weight (otherwise just use a bigger battery), and have efficient enough components for the rest of the system to actually make a difference (this is the gotcha with traditional laptops). Whether this particular set of circumstances will turn out to be a small or huge market niche, it is certainly a small problem space. Of course, much of the blame is due to TM's implementation rather than the (basically sound) idea; apparently their architecture is not up to Intel's standards (their process technology is IBM, so that's not the problem). Of course, mistakes are very common in the first iteration of a wildly new idea--witness Itanium (harnessing VLIW for very different ends--and arguably with less success) for proof of that.
Intel, AMD, and other cpu makers, that being x86 or not, give away the compilers, and see your hardware shine, or help GCC getting good support for your CPU, which we, the normal users can benefit from.
Intel does offer their VTune compilers for sale, as they must in order to legally use them in the SPEC benchmarks where they perform so well. Unfortunately, there are widespread complaints and accusations that they are buggy and tempermental and fail to compile much code that works just fine with gcc, VS, etc. The charge that Intel gets its SPEC scores with compilers which are so optimized that they aren't robust enough for every day use has tarnished Intel's very impressive SPEC scores among some. I haven't ever tried to use VTune so I can't comment as to whether this is FUD or not. It is worth noting that VTune is much much faster than anything else in SPEC, yet rarely used in practice, so there must be something wrong with it.
But Intel does also help other compiler makers incorporate optimizations. I know they specifically work with Cygnus to optimize gcc, and would assume they do the same with MS. AMD also works with compiler makers to get support for 3DNow. (For market reasons--i.e. they will always have smaller market share--AMD designed the Athlon to perform well on P3-optimized code, and thus there is not so much to be gained by including K7 optimizations over and above 3DNow. The P4, on the other hand, is very different from both of them and needs a recompile to perform well, as these numbers demonstrate.)
It's like I'm not even listening...
/. crew for some permanent space to soapbox?
/., I hadn't really thought about it, especially since I feel that there are many people out there who have a whole whole lot more knowledge on MPU design than me. Of course, as I seem to be one of the few who actually tries to enlighten the more software-minded crowd at /., I suppose it might be worth a thought...
Don't worry; the SPEC scores have been very poorly reported, while the P4's rather poor performance on non-optimized code has gotten all the press. You are by far not the only one to have missed the SPEC scores and assume that the P4 is a dud. Of course, in some ways this is valid, since SPEC scores are more indicative of the potential of the P4 core than of how well a P4 will perform on today's code. Still, as it turns out, the Alpha scores I was comparing the P4 against in my original post are for chips that won't be released until January; so technically, the P4 has not just the SPECint2000 crown (base and peak) but the SPECfp2000 crown as well (base only)!
Okay, I'm not heavy into hardware, I just wanted to point out the numerous problems with the P4 - it seems that the processor itself is not one of them!
I certainly wouldn't go out and buy a P4 today--a DDR Athlon is a much better deal for today's software. But the SPEC scores show that once we get some P4-optimized software, it's gonna kick butt. So, mediocre as a current product, great as a debut for a new core.
Tolu, you always have something insightful to say about chip design, but you have to repeat yourself fairly often across articles - have you thought about bugging the
Thanks!
Eh, I do get carried away too easily I suppose. It's always a problem when you feel like you have all this relevant information that many people reading may not know, and you don't know how much of it to repeat. (I generally tend to go for "all of it".) As for submitting an essay to
The Pentium Pro was the last new core from Intel. And may I remind you - the first issue of the PPro - It beat the Alpha! For like a month, until DEC moved to a new process, Intel beat the king of the RISC chips. So I don't think there is the precedence you seem to see - when Intel brings out a new core, I expect bang - not "it'll scale".
.25um to a .18um process and kick some butt.)
ROFL!
No one noticed it, but you got bang today as well.
The original PPro was, for about a month, and only barely, the fastest chip in the world in SPECint95. The P4/1500 is...the fastest chip in the world in SPECint2000. Its SPECint2k scores are 522/535 base/peak; the fastest previously available processor in the world is an Alpha EV67/833 which scores 511/533. Considering Intel will almost certainly release a P4/1600 before Compaq finally releases a faster clocked Alpha, this gap will almost certainly become even larger for the P4. (And then Alpha will *finally* move from a
Even more spectacularly, the PPro shocked the MPU world by being somewhat competitive with the fastest RISC chips in SPECfp95--about 75% the top Alpha scores. Meanwhile, the P4/1500 put up SPECfp2000 scores of...549/558 base/peak, or roughly 90% those of the fastest Alpha.
And yet, just as when the PPro was launched, all we hear about is how the P4 is a failure because it performs poorly on legacy apps. The P6 launched to universal derision from the mainstream computer press because it wasn't any faster than an ordinary P5 at 16-bit apps (yeah, maybe eventually there might be *some* 32-bit apps, but who's going to rewrite their code just to optimize for some new-fangled processor?); it was perhaps the most successful MPU design in history, as predicted by its astonishingly good SPEC95 scores.
The P4 is launching to universal derision from the mainstream computer press because it isn't any faster than an ordinary PIII or Athlon at x87 apps and apps which use instructions which the P4 explicitly deemphasizes in favor of faster replacements (yeah maybe eventually there might be *some* SSE2 and P4 optimized apps, but who's going to recompile or even rewrite some of their code just to optimize for some new-fangled processor?); its SPEC2000 scores are just as astonishing as the PPro's were, if not more so.
We'll see how it plays out this time.
Say what you want. My mom's roommate was an election monitor in Chicago in 1960. She showed up, and she was escorted from the room and told that, were she to actually try to monitor anything, she would likely come to grevious physical harm. She went home.
Of *course* there was no "proof". What kind of idiot rigs an election and leaves *proof* lying around?
Did you read the article I linked?
There were substantial investigations into election irregularities, not just in Illinois but in 11 other states. In Illinois there was a wide-ranging recount in all of the contested precincts--including, one must assume, your mother's roommate's--and they came back only +943 for Nixon, when he was behind by 4500 votes. (Compare, of course, to the current Florida recount, +1457 for Gore from an initial discrepency of 1700 votes.)
Then the Republicans took the case to federal court, where it was thrown out due to lack of evidence. Then they took it to the state electoral board, where it was *also* thrown out due to lack of evidence. Note that no one has ever accused either the state electoral board or the federal courts of falling under the influence of the Cook County democratic machine; indeed, the electoral board was composed of 4 republicans and only 1 democrat.
What sort of lack of evidence?? Well, despite over a month with republican officials crawling the state, demanding recounts, and generally trying their darndest to come up with anything suggesting irregularities, they were not able to present a single affidavit from either an intimidated voter or a cheating election official--like your archetypical "mother's roommate". Indeed, if she had come forward with her story--and believe me, it wouldn't have been difficult as there was national press coverage of the republican team currently scouring the state looking for such stories--then she would have constituted the single piece of solid evidence that we have today that such election irregularities actually existed. The only one.
Sorry, but I don't buy it.
More to the point, that isn't to say that it didn't happen, only to say that the Nixon campaign went as far and farther than the Gore campaign looks likely to do in questioning the results of the election, and they lost. If the Gore campaign manages to draw this election out with challenges and legal action, it may be distasteful and ungracious of them, but it will be no different from what the Nixon campaign did in 1960, popular myth to the contrary.
This is why math professors should stick to math. (Note that I'm a math major, so perhaps I should as well. ;)
Not that the longwinded article linked here actually got around to talking about Natapof's "proof" in any detail, but from what I could piece together, all he proved is that under an electoral college system, each voter has a greater chance of deciding the election with his or her vote than under a direct election.
Well no shit, Sherlock. That's why we're sitting here watching a recount come in dozens of votes at a time, arguing about a couple hundred blind old ladies, and fretting about whether more Floridians overseas are serving in the military or dual citizens of Israel. OF COURSE a smaller number of voters has a larger chance of deciding an election under the electoral college system.
In other words, the e.c. is considerably more unstable and capricious than a direct election. There is a much greater chance that the true will of the people will not be reflected in the final result. Why we need a mathematical proof to investigate this is not totally beyond me, because it's an interesting combinatorial result (I'd assume). Why this Natapof guy actually thinks this is a good thing, though, is utterly ridiculous.
His best argument (according to the article) is that we don't complain that the World Series is determined by who wins the most games, not who scores the most runs. Putting aside the fact that the two situations are *not* analogous (for one thing, the fact that there is a different starting pitcher for any given 4-5 games in a row is the most important argument for why we need a best-of-7 Series), the point here is that the World Series is put on for the purposes of *entertainment*, not of deciding who rules the free world. Not that I'm not having a lot of fun with these election results (side note--I helped elect a corpse! Whaddya think of that!), but there's an argument to be made that instability and lack-of-representation in results, while good for sporting events, are actually *bad* for presidential elections.
Furthermore, he shows absolutely no understanding of the greater "rules" of the electoral college "game". For example, the electoral college has, throughout the course of US history, served to prolong and promote slavery and remove incentives for granting female sufferage or encouraging higher voter turnouts. For some excellent explanation why, why don't we read a *relevant* article by someone who's actually qualified to talk about the electoral college, Akhil Reed Amar (Yale professor and one of the foremost academic experts on Constitutional law).
Just for the record, their father stole the 1960 presidential election for Kennedy away from Nixon.
Just for the record, that's a myth.
Fascinating article, but the upshot of it is that contrary to what is being stated in your post, there is no proof that voting fraud occured in that election on a significant scale. More importantly, contrary to popular belief and nearly every major newspaper this morning, Nixon did not concede the election out concern for the country's well-being, and neither did the charges go uninvestigated. Indeed, there were major investigations into allegations of voting fraud not just in Chicago but all over the nation, and all of them exonerated Kennedy.
In any case, it's important to remember that, due in no small part to the popular belief that he was robbed in 1960, Nixon got his presidency in 1968. So too did the two candidates in our history who actually were "robbed" by the electoral college (i.e. they won the popular vote but couldn't carry a majority in the e.c.), Andrew Jackson and Benjamin Harrison--both won the presidency 4 years later.
Interesting to see what Gore will do...
As I understand, and i'm not an expert, but I heard this on the radio this morning, this can only occur in states that have "faithless electors". I'm not sure if florida is one of them...if it is there is a chance on december 18 that they cast thier vote in favor of who won the popular election.
It's more complicated than that.
Many states have provisions forbidding so-called "faithless electors"--i.e. electors who vote different from the popular vote in the state. However, these provisions are all on the state level, not the federal level, and thus are (arguably) Constitutionally irrelevent to the actual Electoral College vote on Dec. 18.
I believe it is generally accepted that if an elector were to change his or her vote in a state with a provision against faithless electors, then the changed vote would stand but the elector would have some shit to pay with their home state. On the other hand, this would certainly go to the Supreme Court if it occured, and given the current anti-Federalist leanings of conservative judges, that result may be too close to call as well.
The way Rambus handled the situation might not have been the best, but they DO have the patents and they aren't trivial "one-click" ones either.
No, they're exactly like "one-click" patents, except that the way Rambus went about securing and enforcing them is much slimier. Of course, the concepts involved may not seem "trivial" to *you*, but you're not an EE, so--no offense--your opinion isn't what's important here. Indeed, the fact that not only has every other memory company developed interfaces which are "infringing" upon these patents but that Geoff Tate could actually say, "We think it would be difficult, if not impossible, to develop a competing technology to RDRAM and not infringe on our patents," strongly indicates that many of these patents are indeed obvious to someone skilled in the art.
Sort of like how "one-click" might seem like an inane concept to get a patent on to you, because you know that any decent programmer worth a salary could implement it; but to the average person, it involves a moderately impressive knowledge of cookies, databases, etc., and thus isn't trivial at all. Luckily, patent law provides that the former, not the latter standard is necessary. (Unfortunately, the USPTO seems to enforce only the latter anyways.)
As for Rambus' way of handling the situation being "possibly not the best"--no, it was a bit worse than that. What Rambus actually did was take part in JEDEC, the open industry-wide consortium developing the next DRAM standard, and secretly go about filing patents on the very standard being discussed. Either they simply failed to disclose the fact that they had already filed for patents on this technology (and possibly steered discussions into technologies sure to infringe on their patent applications)--which is not only immoral but illegal as well--or they actually went about filing these patents *after* the standards had already been agreed upon, which is hideously immoral and illegal.
The memory manufacturers know this. Companies will never pay royalties unless they absolutely, positively have to.
Not in the memory biz. Indeed, the reason the dramurai have been so reluctant to pay off Rambus is not that they're unused to paying royalties--the DRAM business is rife with royalties and cross-licensed patents; TI already gets patent royalties from every piece of DRAM produced, for example--but rather because they feel that Rambus has gone about getting these patents in a slimy underhanded way.
But this is balanced by the fact that before Samsung, every DRAM manufacturer licensing Rambus' patents was Japanese. Business in Japan is a much smarmier back-room affair than in the US, where giant corporations are used to working out complex patent licensing schemes on trivial patents. This is because Japanese patent law allows patents on ideas, not implementations, and trivial ideas at that--something that the US patent system "doesn't allow", or at least isn't supposed to. The result, of course, is a sluggish deflationary economy ruled over by a handful of giant patent portfolios--er, I mean corporations--in which all economic activity must first be cleared with complex licensing and back-room deals, and in which fighting off yet another extortive patent demand is almost unheard of. An economy which can't grow to save its life, despite interest rates of essentially 0%. The future of the US economy if we don't reverse the recent trend in the USPTO.
Now, the addition of Samsung, a Korean company, to the list of Rambus toadies might seem noteworthy and surprising, except that Samsung is the one dramurai which actually has an important business relationship with Rambus, being the only one currently producing RDRDAM in any quantity (except Toshiba's RDRAM production for PS2). Thus they actually have a reason to fear the loss of their RDRAM license, and thus there are entirely sensible reasons why extortion should work on them, unlike the Japanese who give in to extortion for entirely cultural reasons. Indeed, it's sort of amazing--and a commentary on the validity of Rambus' claims--that Samsung actually held out this long.
When Micron and Infineon license these patents it will mean that they actually have merit (or that they've lost in court). Until then, this is just business as usual, and another sign of how the current patent regime is stifling progress and innovation.
Please tell me this statment isn't serious. I'm so sure that people that order CD's from a club are doing it with the express purpose of ripping off musicians .
That was the point of the statement--few people realize they're ripping off the artist when they buy from a CD club, but everyone has been "educated" into feeling guilty for downloading from Napster. But the reality of things is ironically quite the opposite--Napster costs the musician nothing and ends up selling more of his CDs, whereas CD clubs cost the musician a lot and sell very few of his CDs if any.
One is excellent free publicity, the other terrible publicity that the artist pays a lot for and the labels profit from sleazily. I wasn't accusing people who use CD clubs of being immoral--I used to use them before I knew the business model involved--just uninformed.
Napster isn't really a good example of true p2p filesharing (like gnutella, freenet, etc).
Keep in mind that napster requires a central server to function, and that it is completly controled by a corporation. I don't think this has any real implications for p2p.
Yes, that's true. The server doesn't necessarily have to be controlled by a corporation (duh), but as long as servers cost money most of them likely will. (Don't forget OpenNap, though!)
The same principles hold, though. If it is "illegal" for a corporation to grep a list of hyperlinks for a given string, it is presumably illegal for an individual too. Besides, Gnutella (and presumably Freenet; haven't used it) scales very poorly, and is somewhat unusable for the medium-term. The obvious solution--cache servers--runs into the same legal issues as Napster.
Yes, I realize that there are important legal differences--Napster collects hyperlinks from many people and searches them, as opposed to Gnutella, which only searches your own. (Freenet is like Napster in this regard, though.) And, again, I realize that this does not legally set a precedent.
It's just that this Napster case was probably our best opportunity to have a definitive legal ruling on the technologies involved. Among them: the right to collect user-submitted hyperlinks without checking each for copyright violation; the right to return the results of a grep on those links; the right to exercise fair use in a manner which is not easily distinguishable from infringment to someone assisting in that use (and the right to assist when you don't know if use is fair or infringing). Finally, it might have been nice--say--to have a definitive ruling on whether noncommercial sharing of copyrighted music is fair use (as per the AHRA, hundreds of years of copyright law) or infringing (as per some asinine act snuck through Congress to help prosecute MP3 sharing college students, I forget the name).
Now we'll have to wait, and when these issues come up, it's unlikely that we'll have the full force of a society which almost universally uses and supports Napster (amongst those with access to and knowledge of it)--or the legal expertise of a David Boies--behind us. It's a sad commentary, but if 35 million people were enthusiastic users of DeCSS, that trial probably would have gone differently. If these issues are decided upon in some "fringe" case--like DeCSS--rather than the Napster case, the results are more likely to be similar.
(Also I'm sad at the prospect of losing my Napster, since I will certainly refuse to pay out of principle.)
Correct me if I am wrong, but isn't BMG one of the CD clubs that you purchase a dozen CDs for some price and you don't have any further financial commitments (as compared to some others where you have to purchase a certain number of full-price CDs within a certain time frame)? They might keep it free and load you up on ads or make money selling demographic information about you instead.
BMG is a major record label. They run the BMG CD club, just like Columbia, another major label, runs Columbia House.
Ironically, it is these CD clubs, not Napster, which are stealing from artists. When you get your 12 CDs for 1 penny, those CDs don't just come free from the sky--they come directly out of the artist's pockets.
Yes, that's right. Not only is the artist not making any money (like Napster), they are actually losing it when you order their CD from a CD club. All those CDs are chocked up as "marketing costs", and billed to the artist--along with recording costs, studio time, tour costs, and other promotional costs. (The musician pays every last cent of the cost of recording and selling their album, but the label, not the artist, owns the copyright on their work.) Meanwhile, the label--not the artist, mind you--makes a huge profit by tricking people into paying for all those extra CDs that come along with membership. (For those who don't know, you don't have to pay for anything you don't order; just send it back.)
And finally, unlike Napster, no one who rips off a musician by ordering their CD through a CD club ever goes out and pays for it, because they already have the real thing. A disgusting practice, all in all--one which Napster was helping to end.
Does it bother anyone else that the concept of peer-to-peer file transfer just settled out of court? Yeah, I know, a settlement doesn't set a legally binding precedent. But something tells me we can all thank Napster for selling our rights down the drain.
Of course, it's hard to deny that a settlement probably is in Napster's best interests. Maybe this just means that we can't let corporations fight for our civil rights; they are not citizens, and thus will almost always have little to lose by giving up.
Yes, we're still free to fight on our own, but this is going to take a lot of momentum away. With P2P fragmented amongst a dozen different networks, it's going to be hard to be able to point to something and say, look, if 35 million people engage in a behavior, then by any concept of a social contract based government it cannot possibly be illegal. What are the chances David Boies is going to work pro-bono for Gnutella?
Ugh. Well maybe the rest of the big 5 will be typically shortsided and this will all fall through, and we'll finally get this decaying mess of an anachronistic copyright system hauled in front of the Supreme Court. Or maybe this is better; maybe it's best that it doesn't get that far until the costs of letting media conglomerates rewrite the copyright laws becomes abundantly clear to everyone.
And it doesn't play VCDs, SVCDs and the like, unlike most $170 DVD players nowadays.
But it has better DVD quality than those $170 players; better, even than most $500 players. (This is going on reviews I've read, not my own personal experience.) You may need VCD support and a 5-disc changer, but most people only care about quality and price.
So the only good reason to get a PS2 anytime soon is for the (you guessed it) games.
Well, the people in Japan who have only bought an abysmal average of 1.8 games/ PS2 sold seem to disagree with you. Of course, that number will rise once the games stop sucking so much.
Interesting comments.
- ramp of Convergence Past, Present and Future. Instead, as you insightfully point out, they may only succeed in losing a hell of a lot of money selling cheap DVD players to people with little intention of playing games. Sony is obviously betting that those people will justify the high purchase price to themselves as cheap-for-a-DVD-player, and then start buying games on the justification of well-I-already-own-the-console. Of course, in order for that to work, the games have to be fun, which they currently aren't. The question is how quickly they'll become so.
You may find the DC has better graphics for most titles and Dead or Alive 2, which is on both platforms, looks much better on the DC.
It's not as direct a comparison, but Quake 3 on the DC looks better and runs smoother than Unreal Tournament for the PS2. And while this was true for Q3 and UT on the PC, it's fair to say that the DC Q3 comes closer to the original than UT on the PS2 does, an impressive statement considering Q3 only really pulled ahead on high-end PC's. The PS2 is clearly the more powerful machine as a simple sum-of-its-parts, but half the VRAM and much less ease-of-programming are apparently hurting it quite a bit.
Sony has a terrible software ratio(1:1.8) which makes the $188 hit on each PS2 more difficult.And as you know, the $$$ come from software sales.
Why is Sony losing on the PS2? Easy, the PS2 happens to be one of the cheap(est) DVD player on the market.
Sony has apparently forgotten all the lessons they so ably taught Nintendo and Sega with the launch of the PS1. With the PS2, Sony has copied Nintendo's restrictive licensing, the Saturn's difficulty-of-programming, and both Nintendo and Sega's head-in-the-sand arrogance. Hell, they've even copied 3DO and the Phillips CDi with their hairbrained scheme to turn the PS2--which doesn't even ship with a modem--into the mythical set-top-entertainment-center-information-super-on
This is going to be a damn interesting round of the video-game wars, perhaps the most interesting yet. Objectively, Sega ought to be in a fine position--decent DC sales, great price, PS2 shortages, online gaming outta the box, and a crop of games which matches the PS2 graphically and bests it in gameplay. But they're losing money, and most importantly, they've lost hype. Hardcore gamers love the DC--but they've already got one. For everyone else, the only thing that's gonna get them to buy a DC is that--as you suggest--they go to the store looking for a PS2 and find out there are none. I'm not so sure that this is the sort of thing you want to build a market strategy on.
And then there's the XBox. So far, MS is playing the role of Sony in the last round--listen to developers, make the machine easy to program, snap up as many big-name titles as you can. Of course the big difference is in timing--the PS1 came out second, but only because Sega rushed the Saturn launch, with disasterous results. The XBox is coming late, which is held out by some as the fatal mistake of the N64. But with the lateness should be a corresponding technical superiority, something N64 didn't have. Plus it'll have a ton of top-tier 3rd party games, another fatal weakness of the N64.
It used to be everyone ridiculed the XBox as misguided, bloated, underpowered vaporware. Nowadays the only place you run into those opinions is slashdot, and less and less even here. Time and Newsweek are still sold on the PS2 hype, but developers appear to have moved on, and regularly gush about the XBox. I'm sure we all hope the latter group is more important in the long run.
And then, of course, there's the GameCube. Well, it's nice, and it comes in cute colors. To me it just screams XBox-lite--more powerful and easier than the PS2, but not as powerful or easy as XBox. eDRAM is some pretty hot technology, but still expensive and difficult to fab. Frankly, I don't trust it in the hands of ArtX any more than I would the Bitboys (Oy!). And I just don't think the rest of the system is going to be up to snuff, especially by the time it launches.
What Nintendo has going for it is some hot properties--Mario, Zelda, Metroid, DK, Pokemon. But while some great games have been made out of these, they're in a shrinking niche of the gaming industry, as the power of technology is allowing video games to become much more complex and appeal to an audience far beyond 9-14 year old boys. Meanwhile, MS seems to have miraculously gotten a share of or stolen outright all the great games which were once reasons to look forward to a PS2--Halo, MGS2, Oddworld, Crash, etc. I've heard EA is about to be signed, if they're not already. About all Sony has left is Square, and we'll see for how long.
So if I had to guess, I'd go with the XBox as the victor, the DC as becoming a small but solid success for a Sega desperately in need of that, the NGC as being the same for a Nintendo with rather greater aspirations for it, and the PS2 as garnering significant marketshare but without earning Sony either the profits or the influence it has apparently decided are inevitable.
But we'll have to see.
Despite Microsoft's continued insistence, it's a Windows PC in console packaging and as such is very easy to develop for, since so many people know DirectX.
No, it's a console which runs an embedded-NT kernel and Direct X. It doesn't have a motherboard in the traditional PC sense, but rather a mainboard like a console, with both the CPU and GPU sharing the 6.4 GB/s bus to RAM in a UMA architecture. Yes, the CPU and GPU share the same basic internal core as a PIII and NV20, but they are custom chips, not off-the-shelf. Alright, it's closer to a PC than any other console, but it's not just a Windows PC with different form factor.
And it's not just easier than PS2 for those who know Direct X already: to program the PS2 you need to essentially write balanced assembly code for 2 vector processors all while streaming textures into the tiny 4MB VRAM fast enough to keep up with the action. The libraries that ought to be around to help developers do basic tasks are apparently rather scarce. Meanwhile, Direct X, whatever one may think of it, is certainly a far superior solution, even for someone who has never used it before. That's why in addition to PC developers making the cross-over, there's a whole lot of console-only developers on that list.
When I use Napster, I only share about half my MP3s (still over 2 GB, and generally my more interesting stuff), because if I share them all it takes abominably long to log in.
When I use Gnutella, I often don't share at all, because my CPU utilization goes very high when I do, and then I can't listen to the new MP3s I'm getting without skips. (I assume this is due to my computer needing to check every search string that comes through against my list of shared files.)
Both of these problems are fixable with increased bandwidth and computing power. (Or maybe I just have a buggy version of Gnutella.) I'm very enthusiastic about the possibilities of P2P, and I genuinely try to share as much as possible. While I realize not everyone on Gnutella or Napster is as idealistic, I have a feeling the percentage who are is a good bit higher than the 2% (or whatever) reported. Of course you can't blame CNet for taking the "corporate whore" view of human nature, but in my experience people like to share with each other, and will especially do so whenever it is easy and doesn't have noticable drawbacks.
Fortunately for Intel, they didn't have to take any risks, since every single one of the things you mentioned was done by someone else first. Hell, the Alpha alone did all of them before Intel did. Not one of these technologies were "in it's infancy" when Intel deployed them.
.18 um process technology. Apparently the way they did this is by implementing a new, lower-latency adding technique. This is the circuit-design equivalent of finding, for example, a faster sorting algorithm; it represents a very impressive achievement. While the double-pumped ALU will likely not have as large an effect on overall P4 performance as the trace cache, it should help out noticeably and it's definitely a radical design.
The only risk Intel takes in deploying any of these technologies is the risk that Intel customers won't buy them. That's the risk every company takes when introducing a new model. While yes, it means Intel is taking risks, none of the risks Intel takes actually advance the state of the art.
That's just because he came up with a bad list. Despite the fact that there are very few totally new ideas in the MPU industry (just as there are very few totally new software algorithms), Intel has indeed bet the farm (well, bet the product line) on some very radical design ideas, both in the past and the present.
Some were successful, some crashed and burned. One design that was extraordinarily innovative and successful was the P6 core, introduced in 1995 with the PPro. In it, Intel managed to do "the impossible"--execute variable-length x86 code out-of-order, something that was supposed to be only possible with a fixed-length ISA and was even relatively state-of-the-art there. The way they did this was by essentially "emulating" x86 code by decoding it into internal "RISC-like" ops, which could be run OOO. While I doubt this was an entirely new idea, I'm not aware of any previous implementations of it, much less one as wildly successful as the P6.
One design that was a horrid failure was the iAPX432, an MPU spread out over 3 chips which essentially operated in an object-oriented manner, rather than iteratively like, well, every other chip in history. Perhaps a sign of what was to follow was the fact that the 432's "assembly code" was actually built to closely model ADA, the government's ill-fated OO language. The 432 somehow managed to work, but performed a bit slower than mainstream MPUs from 5 years beforehand. Not too many sold. But there is no doubt that here Intel took a huge risk based on a very interesting idea.
Nowadays Intel is engaged in exactly the same "risky" design behavior in an attempt to further the state of the art. The P4 contains several totally new innovations. Perhaps most prominent is the trace cache, an L1 instruction cache which instead of just dumbly storing instructions, orders them safely and unrolls loops, allowing branch- and dependency-free operation for large swaths of code. In addition, the trace cache stores those internal "RISC-like" ops, not x86 ops like a normal instruction cache; this takes the x86->"RISCop" decoder out of the critical path and should result in higher top-clock-speeds and excellent performance on small looped code which can fit in the L1 trace cache--3D engines, encryption, and FFT (i.e. audio/video encoding/decoding, voice recognition), for example. Trace caches are not a new idea; they've apparently been studied quite a bit in the literature. However, the P4 is the first commercial MPU to include one, and that's a substantial engineering innovation.
Another innovation which is, from what I've heard, actually a totally new idea is the P4's double-pumped ALU and supporting hardware. While the idea of different pieces of hardware running at multiple speeds is of course not new, this is apparently the first time it's been worthwhile to implement it on-die in a commercial MPU. More impressive is the fact that Intel was actually able to get an ALU--one of the most studied logic circuits in history--to run at up to 4.0 GHz in current
On the other hand, we have Intel's upcoming IA-64 ISA, an attempt to move the VLIW philosophy from specialized DSP work into general-purpose computing. Again, VLIW is not a new idea, and the idea of a VLIW general-purpose MPU is not either. However, the Itanium is one of the first attempts to actually build one (Transmeta's Crusoe is the other).
Furthermore, it represents quite a risk from a performance standpoint. The basic idea behind VLIW is to in effect take the RISC revolution one step further. While the RISC vs. CISC debate is often treated as a fair fight capable of producing one victor, the reality was quite different. (The following is essentially a synopsis of this excellent article on ArsTechnica.) Instead, each was the best ISA philosophy for the prevailing conditions at the time. CISC was the best design choice for its time--that is, up until the early 80's--and "pure RISC" the best for its time--from the mid 80's until the mid 90's.
The main issues involved the evolution of storage capabilities and compiler technology. First a broad comparison of what CISC and RISC actually mean: CISC refers to a category of ISAs in which a new instruction is concieved of to take care of every possible situation. A (made-up) example of a CISC-like instruction is the following:
CRAZY_OP, mem1, r1, mem2
which does the following: load mem1 from memory, take r1 from a register on the chip, compute (mem1 - r1) / r1^2, and store that in mem2. And there actually were some CISC instructions which were nearly that crazy. The RISC philosophy, on the other hand, would break that one operation down into many--one to load mem1 to a register, one to subract mem1-r1, one to multiply r1*r1, one to divide the two, and one to store the result, for a total of...lessee...5 instructions.
What's the difference? Well, like I said, it came back to storage capabilities and compiler technology. Back in the 70's when CISC was the Right Thing To Do, storage was extremely expensive and thus very scarce. If chips back then had used my RISC design, such an operation would have taken 5 instructions to code; with my CISC design, it takes just one. Yes, the CISC design might need to reserve some extra bits in the opcode field in order to code for so many ridiculous instructions, but overall the compiled RISC code is going to take at least 4 times as much storage space as the CISC code. So even if you didn't expect to run into the above situation very often, it made sense to have an explicit code name for it whenever you did.
As we hit the 80's, these storage issues rapidly eased, to the point where it wasn't such a hardship taking 4 times as much space to say the same thing every once in a while. Meanwhile, back in the CISC way of doing things, you actually needed to find some way to make your chip capable of performing all the goofy instructions that might be asked of it. In essence, it's almost like your assembly code is "compressed" to save storage space, and thus needs to be "decompressed" by the chip. This means complicated chip implementations, each trying to do more in each clock cycle--which means lower top clock speeds. The RISC chip may need more cycles to do perform all 5 instructions, but since it only performs simple instructions, it can have a higher clock speed and thus come out ahead.
But there's a problem with this too: people generally like to program in high-level languages. RISC is a low-level ISA philosophy. Thus you need to have good compilers, to be able to analyze high-level instructions and decompose them into all their composite parts for encoding in a RISC assembly language--often a more difficult process than in my example. Again, the compilers of the 70's weren't up to the task; only in the 80's did good enough compilers come along to enable this. In essence, we moved the "decompression" of a high-level instruction to its low-level constituant operations from inside the chip (CISC) to in the compiler (RISC).
Thus, we went from CISC being a Good Thing to RISC being a Good Thing. The main issues were 1) code bloat not such a big deal and 2) move more instruction scheduling duties to the compiler.
Since that time, we've moved from what I called "pure RISC" to what Hannibal in the article I'm summarizing calls "post-RISC". That is, people started realizing that with RISC operations being more-or-less uniform, a good way to make things to faster was to do more than one thing at a time, and that instead of sitting and waiting on a long memory access, etc., you could switch and do other stuff at the time. Thus we got superscalar and out-of-order execution, respectively.
Moreover, we got deeper and deeper pipelines--sort of like assembly lines, in which each instruction goes through several stages, each 1 clock long, in its execution. This means we can clock the chip faster (less to do on each clock cycle), and get overall faster performance (think a fire brigade of 10 people each passing buckets a short distance, vs. one person running 10 times as far between buckets delivered). The problem is that, unlike buckets or trucks, code has dependencies; instruction 2 might take as its input the result of instruction 1, which is still in the pipeline--only halfway down the assembly line, as it were. Thus we need rescheduling logic to keep our pipeline stuffed--our assembly line filled--with instructions which don't depend on each other. Or, instruction 1 might be a branch instruction, which goes one way or another based on its result, so that we don't know "what comes next" until it is completely finished. Thus we use branch prediction, which uses some statistical methods to guess what comes next, and execute it accordingly, while aware that if when we get to the end of instruction 1 it turns out something else came next, we need to go over and do that instead.
The result of all this out-of-order superscalar pipelined "post-RISC" stuff was much higher IPC (Instructions executed Per Clock), but also lots of complicated logic on MPUs to handle all the scheduling and dependency checking and prediction. Theoretically, just as all the complicated logic made CISC chips complicated and slow, all this complicated logic makes today's post-RISC too complicated, too large, too hot, and slower than they might otherwise be. [end summary]
Thus, the basic idea behind VLIW is an extension of the idea behind the CISC->RISC transition. To wit: why not take all this complexity out of the MPU and put it back into the compiler? That way, we can get rid of all the unpleasantness once, at compile time--on the developer's time, not the user's. The way it does this is by trying to find all the parallelism, work out all the dependencies, and predict all the branches at compile time--in other words, to do all the scheduling at compile time. The way it communicates this to the chip, then, is to compile not to individual instructions for the chip to schedule, but rather into prescheduled "bundles"--or "Very Long Instruction Words"--which are supposed to be guaranteed to work well when run together in parallel.
Or rather, this is how VLIW works where it is normally used--in DSP type processors, running programs for which it is very easy to extract this sort of data at compile-time. Problem is, it is much more difficult to do with general-purpose programs, which is why it hasn't been done before. As you might guess, there's just too much you don't know at compile-time for you to get unambiguous scheduling information. Transmeta solves this problem by compiling at run-time, using their code-morphing software, essentially a JIT compiler. The problems with this are obvious and well known: namely, that the JIT compiler uses resources which would otherwise go to running the program, and that you don't get the VLIW benefit of doing all the optimization once and forgetting about it. (The code-morphing software caches, profiles, and further optimizes the code its already run, but it still always running, and doesn't save this information from session to session.) Indeed, you're essentially moving the scheduling problem from one which is done by specialized on-chip logic in different pipeline stages than the execution logic--and thus not competing for execution resources--to one which is run by the general-purpose execution logic; a shaky trade-off at best. On the other hand, by working in software you theoretically get more flexibility to schedule instructions than when doing the scheduling with a chip's fixed logic.
The way IA-64 handles this problem is to have the compiler insert "hints" about which instructions look like they *might* be able to run in parallel, without dependencies; which way a branch is *likely* to go; which scheduling is *likely* to make good use of the chip's execution resources. The problem with this is, as the hints are inevitably going to be wrong, the chip needs its own analogues of much of the scheduling hardware it was trying to get rid of in the first place. In some ways, it's little more than a change in terminology: with OOE designs you have a smallish general register set with a large set of "rename registers", so that each instruction running in parallel essentially thinks it has a full copy of the general register set all to itself; with IA-64, you just have a huge general register set so that each parallel instruction has enough registers to work with.
The problem is, of course, that you haven't done what you set out to do--eliminate complex scheduling logic from the processor. Instead, you've just replaced it with similar but less-well understood versions of the same stuff. The end result is the that Itanium core, far from being small, simple and clocking fast, is huge, complex, unbalanced, and therefore capable of pitiful clock speeds. The die is ~300mm^2--roughly 3 times the size of a P3--yet only has room for a total of 16kb L1 and 96kb L2 cache, less than even a lowly Celeron. (Server level chips like Itanium generally need much *larger* caches than PC chips; Itanium is supplemented with a large off-chip L3 cache, but it is too high-latency to be much use.) Itanium was supposed to launch in early 1998 at 800MHz; it is only now yielding above 733MHz--again, Celeron territory.
Furthermore, we run into trouble from an unexpected place--code bloat. Of course, it's not the same problem as in the 70's, when we used CISC ISA's to keep code small so that they could be stored at all; today's 100GB HD's testify to that. Rather the problem is that *bandwidth* to storage is very often the limiting factor with today's technologies, and that high-bandwidth storage--i.e. on-chip cache--is just as scarce as overall storage was in the 70's. With all its hints and bundling and exception codes to execute if the hints turn out wrong, IA-64 is much more bloated than x86 or RISC code, and thus those not-even-Celeron sized on-die caches are effectively even smaller.
Of course, Itanium has more functional units than the P6 core, and if all these compiling tricks actually keep them full of instructions, it will perform much better per clock. Unfortunately, all indications are that even with the relaxed "hints instead of guarantees" rule, it's still just too difficult for today's compiler technology to keep this monster even remotely well-fed. Intel even had the gall to claim at their recent Intel Developers' Forum that the SPEC CPU benchmarks were "irrelevent" for Itanium's target market, offering instead a (hand-written in assembly) RSA encryption benchmark in which Itanium demolished a Sun USIII. Well, that's fine, except that a very cheap dedicated encryption chip can beat the Itanium at this game several times over for 1% the cost and power requirements. Of course, the SPEC benchmarks run exactly the sorts of programs used in Itanium's target market, and are the most relevent measure possible. And not coincidentally, they are extremely sensitive to compiler quality.
So...to get back to our original topic, IA-64 is another huge risk--Intel has repeatedly called it a "bet-the-company thing"--which incorporates some very interesting, non-mainstream ideas in an attempt to radically advance the state of the art. And so far, it appears not to be working.
Don't worry too much about Intel, though; from all indications, McKinley, the 2nd-generation IA-64 core, should perform just fine thank you. Interestingly enough, it was designed almost entirely by HP engineers. But it also must be emphasized that they have clearly learned from Intel's myriad mistakes with Itanium. (Everything about Itanium, from the pitiful tacked-on caches to the rather unnatural pipeline design--apparently an extra stage needed to be added late in the design process--indicates that this design was a "learner".) Plus, Itanium has been delayed so long that the almost-on-schedule McKinley is due out relatively soon--roadmaps have it as soon as Q4 2001 (dubious), and Q1 2002 might actually be reasonable. McKinley should clock just fine (although not as high as the CISC front-end P4), and has plenty of on-die cache. And in a year and a bit, the compilers might finally be ready too.
So, Intel might just turn this risky strategy into gold. Maybe the "post-RISC" paradigm *will* run out of gas soon, and VLIW will speed past. The point is, for better or worse, Intel's MPU designers are not conservative in the least.
AMD, on the other hand, has never introduced any significant new MPU design techniques that I can think of; instead they concentrate on implementing Intel's designs better than Intel. Indeed, their first PC MPUs had the same names as Intel's--AMD made a "386" and a "486", and possibly a "286" too, I don't remember. The much-vaunted K7 is really quite similar to the P6 core, just with more functional units, larger buffers, more decoders--more more. It's a better version of the P6 (though less power efficient), but it's not terribly innovative. Of course, AMD was in a precarious enough position market-wise that they didn't have the luxury of taking engineering risks. Intel being relatively secure (and percieving themselves, Andy Grove's catchy business-trade bestsellers notwithstanding--as even more so), they can and do experiment with some wacky stuff. Some of it works, some of it doesn't, and for some of it they take their massive market power and force it to work.
Way to read the article, folks. "And at its labs in China, the company is developing an embedded operating system and emulation software that would run PC and Sony Playstation games on the RISC chip"
/. once again.
This is a general-purpose, 300MHz RISC chip. There is no way in hell it could emulate the special-purpose, 300MHz SIMD-based Emotion Engine, nor its dual-channel RDRAM memory.
Nor is there any reason why ALi would ever want to do so. All gaming consoles, including the PS2, are sold at a loss. The only reason Sony wants to sell PS2s is that they make money on licensing fees for each game sold. The console is a loss leader. There is no way to make a PS2 for less than $300; if there was, Sony would sell them for less than $300. Therefore it is a very simple conclusion that this chip emulates the PS1--simple enough to be emulated without incurring additional cost--not the state-of-the-art PS2.
Shame on
I meant, from the launch of the P4 until...until whenever. For the next month-and-a-bit, AMD most definitely has the highest-performing x86 chip around, bar none--as it arguably has for the last 11 months or so. (Intel's highest speed grades have been available in such laughable quantities that only the benchmarkers were the only ones to get a hold of them; thus it's arguable whether they aught to count any more than the 1.13 does now--i.e. not at all.)
First off, apologies for slipping in ad hominem attacks in my post. However, this was just in response to your similarly inappropriate attacks on Johan and Ace's. The difference, of course, is that my comments were in support of the correct analysis, not disparaging it.
/., but it was the very definition of a fluff piece--like much of /., now that you mention it.
/. But he is just a student, not an expert.
/. pages for two years, plus linkage from several other sites, and a few print articles (though I no longer like to touch paper. How primitive...). I could easily look up links, hell, just by using the search features on /. and Anand.
/.ers that they know more about computers than, say, the guy who says he needs to go out and buy more "RAMs" because the new game he just bought says it requires 250MB of free space to install. (Don't worry--you're not that bad, it was just an analogy. ;)
/. and Anandtech--didn't know what was going on, it must be "IYNSHO, fluff masquerading as technical writing". Unfortunately, your opinion, humble or no, does not apply here: it is indeed a fact that this piece contained several new insights, and synthesized information which was not easily available in other forms. This may not meet your standards of being "more than a book report," but it is certainly meets those of technical writing. Obviously Johan could not hope to benchmark the new P4 or Mustang cores, as they are not released yet; still he managed to include some insightful benchmarks which demonstrate the points he was abstractly discussing with ample clarity. (Of course, if you're used to looking at MPUs as mysterious black boxes, then you might wonder what rehashed K6 benchmarks are doing in a Mustang/P4 article.) If you truly believe that this article included "nothing unique", why don't you post just one article detailing the issues I raised in my previous post? Since you've obviously read such an article yourself, MPU expert that you are, it shouldn't be too difficult to dig up a link, even without resorting to "the search features on Anand." (LOL!)
/. (although in fairness he includes a link to his earlier, more general P4 article in the very first sentence); of course, it's /. who decides what to post on their own site, not Johan.
.18 Cu process; the P6 around 750 MHz on a .18 Al process. Intel's process is probably slightly better except for the large Cu vs. Al gap, so we can be charitable and say that, on identical processes, the K7 clocks 25% faster in an untweaked core than the P6 does in a much-tweaked core. Indications are that the Mustang/Palomino/Morgan K7 tweak will reach 1.5 GHz on .18 Cu, so perhaps 35-40% better on equivalent processes. As for IPC, the P6 and K7 are essentially equal. Indeed, this is being generous to the K7, as the P6 knocks it all over the place in the fairest cross-platform bench there is, SPEC. Yes, this is because Intel's in-house compiler group is better than AMD's...but the compiler is arguably just as much a part of a core as the silicon itself.
...Even though that name happens to be shared by the current generation of Compaq Alpha MPUs... Or wait; actually the official name of the current Alphas chips is 21264; it's just that they like to code name their core variants things like "EV6" and "EV67" and the upcoming "EV68". Based on what, again? What's that? Based on the code name of the current Alpha platform???
/. articles on the K8, they have all been, IIRC, about the recently released x86-64 simulator to help Linux, etc. port to the new ISA. This has nothing whatsoever to do with the design of the K8 itself--an x86 simulator could help "port" Linux to either a P5 or a P6, although they could not be more different architecturally--just the ISA.
.13 Cu process. Northwood will allow Intel to get good yields at >2.0 GHz, and, more importantly, takes up a much more reasonable die space for mass production. Just as important, its release will coincide with the release of the (hopefully DDR) SDRAM Brookdale chipset, which ought to move the P4 out of the quite-high end where it will be stuck with the dual-RDRAM Tehama chipset. (3rd party DDR chipsets may be out for the P4 before then, but probably not in much volume before Q2 at the earliest.)
/. with uninformed arguments). What's even worse, though, is that several influential tech sites employ writers not much more knowledgeable than you, and they spout the same pro-AMD propaganda day after day after day. It's not that I dislike seeing anything pro-AMD or anti-Intel; indeed, exactly the contrary. It's just that I like it to be true.
This is not a site on which every little review and rant is meant to be posted. Hannibal's article belonged here because, well, Hannibal is an expert on the technology behind microprocessors.
1) It was Hemos' decision to post this; anyone can submit anything they deem worthwhile.
2) This was neither review or rant, but rather a lengthy and insightful look at some subtle but very important issues that will influence P4 vs. Mustang performance. Just because you've never seen anything on the web supportive of the P4 doesn't make a balanced piece a rant; it just means that you've been reading a lot of ignorant writing.
3) Humorously enough, the "self-promot ing" Hannibal link I offered was exactly "every little review", this time of some gimmicky portable (but monitor-less) PC. I found it entertaining, and was happy to see it on
4) Hannibal is NOT an MPU expert. He himself will acknowlege this, and has in his articles (don't have time to find where). Email him yourself and ask him who is more of an expert, himself or Johan De Gelas, and I am relatively certain he'll say Johan. If not, he will readily admit that Johan is at least his equal and that Ace's is a much more technical site than Ars. And he will most certainly admit that Paul DeMone is 10 times the expert he is. Again, I really really like Hannibal's work, it honestly inspired me, and I submit every new Hannibal-on-architecture article to
And, just FYI, I have read every single article on microproccessor design that has passed by the
ROFLMAO!
You read the scant handful of poorly chosen architecture articles linked from slashdot and you consider yourself an expert??? HAHAHAHAHAHA. Oh--and sometimes you check your facts with little old 16-year old Anand.
Look dude, it isn't my place to criticize you for not knowing as much about MPU design as I do. It is my place to criticize you for not realizing that there is much more to be known, for not realizing that many people do know more about it than you. I am certainly no expert--I'm just a college student--but it is blindingly clear that I know more about it than you, just as it is clear to most
Second it's quite clear that you essentially skipped all the parts of the article you didn't understand and concluded that if you--with your expert education on MPU design from
No, Johan didn't take what might be called the "Hannibal route"--i.e. launch into an exploration of the overall design philosophies behind the two cores--because he is writing for a specific audience, a knowledgeable technical audience who can be expected to have read several pieces explaining the important design features of the P4 (not much concrete is known about the Mustang other than that it will be a K7 with tweaked layout to improve critical path and power consumption, and that it may recieve several other enhancements as speculated in the article), specifically those here, here, here, here, and here. Not only have most regular readers of Ace's read all these articles, but they have followed some very interesting debates on them between industry experts on the Ace's tech forum for months now. It might be fair to criticize Johan for submitting an article which clearly assumed such a technical background to
Re: preproduction benchmarks, the Tom's piece on the PII and the Firingsquad piece on the K7 were generally the only benchmarks available of the respective chips before their launch. If you followed MPU design news as closely as I do you would know this. There is a thing called an NDA, after all; as these two pieces demonstrate, both Intel and AMD like to make sure that those who choose to break theirs post erroneous information.
You're of course right that a PPro was indeed superior to a PII at a given clock speed; if you look through the article itself instead of just relying on the concluding quotes I posted, you would find benchmarks which clearly understate the known performance of the PII by as much as 30 or 40%, though. There is no doubt Tom's preproduction benchmarks, like Firingsquad's, were horribly off. And as long as you're disputing my "always" contention--I've ponied up the links (and no, it didn't take me very long at all, because I, having followed MPU developments for a couple years, knew for example that it was FS with the bad preproduction K7 benchmarks, and Tom with the PII controversy); why don't you post a single pre-NDA "review" or even just a series of leaked benchmarks on a new x86 core which proved entirely accurate?
Re: definition of a 7th-gen core: You really should pay more attention in class, boy, because I'm schooling you right now. I explained what I meant right after that comment in the original post: The Athlon is a new core designed to scale well to very high clockspeeds. Just like the Willammette. That's why the Willamette performs slower clock-for-clock than a theoreticl P!!! at the same clockspeed
First off, there is no evidence that the P4 has lower IPC than the P3, except for preproduction benchmarks and some ambiguous comments from Intel VPs. If you read my previous post at all, you would realize these would tend to indicate that the P4 actually has higher IPC, not lower. On the other hand, the main evidence that it has higher IPC is that an analysis of all the new, innovative, braniac features of the core strongly indicate that it must.
And second off, you couldn't be more wrong. By calling the K7 a "7th-gen core" you are obviously comparing it to the 6 previous generations of Intel cores. Each of them was able to improve both clock speed on identical process and IPC significantly over the previous generation. The Athlon beats the P6 in clock speed on identical process...but only narrowly: the Athlon sweet spot right now is around 1 GHz on Dresden's
Meanwhile, P4 roadmaps indicate that it will scale 100% better than the P6 on identical processes, and the analysis of Paul DeMone, a far greater MPU expert than you or I could hope to be, is that it will have 15-20% better IPC for integer work, and considerably greater gains for FP. (It's too soon to tell without knowing more about how well compilers will optimize for SSE2.) That would be a 7th-gen core worthy of the leap from 5th to 6th which the P6 provided.
Again, don't get me wrong: the Athlon is clearly clearly superior to the Coppermine P3. But only by about the same degree as the Coppermine P3 was superior to the Katmai P3. That is, *not* by a full "generation"--whatever the hell that is.
Re: important innovations in the K7: It's a huge innovation in the x86 world, something Intel hasn't done since the PPro days. First of all, the EV6 bus is new to x86 and a huge innovation, it is superior in every way to the old Intel GTL+.
BWAA HAA HAA HAA HAA HAA!! Man, I'm rolling on the floor and crying that's so pitiful.
Oh, but I'm being rude. Ahem. Pardon me. You, uh...you do know why it's called the EV6 bus, don't you?
HAA HAA HAA HAA HAA! I asked you to pick one innovative feature of the K7, and you picked the one feature that AMD DIDN'T INVENT!!!!!
Ok, I'm over it now. Phew.
Right. AMD didn't invent the EV6 bus. They didn't help develop it. They in fact had nothing to do with it. They licensed it, wholesale, lock, stock, barrel, from Compaq where it has been in use for quite some time now. On the one hand, it was a good business decision because Intel had just clamped down and decided not to relicense the P6 bus (not really called GTL+ BTW, but don't worry it's a very common mistake) to AMD, and rather than take the time to reinvent the wheel (and thus delay the launch of the K7), AMD decided to go shopping at Compaq. Fine. Smart decision.
Don't give me any of this revisionist history that they did it because it's 200 MHz, though. The K7's extra FSB bandwidth (courtesy of the EV6 bus (and the engineers at Compaq, not AMD)) has up till now been entirely wasted as it is paired with SDR SDRAM (1.6 GB/s FSB, only 1.06 GB/s from DRAM)--generally paired asynchronously with PC133. If it were any help at all, don't you think the Athlon would be winning and not losing in FSB-intensive benchmarks like Q3? Meanwhile, it's a huge waste of pins and power--as well it should be, since it was originally designed for $10,000-50,000 workstations and servers, which, frankly, can afford the extra mobo costs, power supplies and electric bills.
Now, of course this extra FSB bandwidth will finally be put to good use with the advent of DDR mobos for the K7, *finally* starting early next month (fingers crossed!); latest news is the 1st. DDR mobos for the P3 will show less improvement because the P3 is stuck at 133 MHz FSB. Fine.
But this isn't why AMD chose the EV6. Indeed, when they made that decision, the DDR standard had either barely-just-been or had-not-yet-been determined by JEDEC. Intel was set to steamroll RDRAM into every PC, and there was little to no indictation that DDR would ever be a volume part in the PC industry. (It'd be used in servers and such.) AMD chose the EV6 because they *had* to, not because they wanted to. It's a great bus when doing what it's designed for--connecting specially made (quite expensive) double-wide SDRAM to Alphas, at FSB freqencies up to 466 MHz. But it offers little to no performance benefit in the here and now for the K7. And as for DDR and high-speed buses, Intel will be releasing their Tualatin revision P3's in Q2 with a 200 MHz FSB, in time for Almador, their (maybe--legal issues with Rambus...) DDR P3 chipset. So yes, the K7 will be first with decent DDR support in the x86 space. The P4's dual-RDRAM chipset and 3.2 GB/s FSB will be faster, though more expensive, as far as memory performance goes, though.
But calling a bus that AMD had exactly zero nada zilch nothing to do with evidence of their design innovation gets an extra HAHAHAHAHAHA from me.
More on AMD's innovative EV6 bus: it's a huge boon for multiprocessing, with the 760MP coming to retail very soon according to Anand.
Unfortunately, according to AMD's Q3 earnings report Investor Conference Call 2 weeks ago (I was listening; somehow I doubt you were...), the 760 MP has been delayed to at least Q1, possibly Q2. They played it off as strategic reasons (business demand down; no major deals with the big 4 server OEMs (Dell, IBM, HP, Compaq) for AMD in the enterprise lines), but considering they only had one 2-way system--behind closed doors and not running anything--at MPF it looks as if their engineering is behind too. On the one hand, too bad, because point-to-point beats shared bus any day. On the other, there's a reason why Intel went with shared bus, and it's not because they'd never heard of PTP. It's, well, easier to implement. When doing the right thing takes over a year longer, it sometimes becomes doing the wrong thing. (Not that I believe that's true here, but it's worth taking into consideration.)
Now, the countryside is littered with Athlon clusters crunching numbers for the scientific community in places where they'd never have considered using a P!!!.
First off, scientific computing is such a niche market as to have absolutely negligable impact on the bottom line of either company. The idea that AMD designed the K7's huge-ass FPU--thus taking up vital die-space--for the lucrative physicist market is laughable. It's an unbalanced design, plain and simple. Second, last time I checked, most scientific computing was being done either on Alphas or on Beowulf's of Celerons. Now, I don't doubt that K7's are moving heavily into the mix; if I was doing scientific computing, I would go with a cluster of Durons in a heartbeat.
But do you really, honestly, think that when AMD decided to go with the 3-wide FPU there were dreams of meteorology and electron potential modeling spinning in their heads? Me either.
And your "analysis" of the supposed advantages of the Coppermine's cache over the Thunderbird's are positively laughable. You see, in the REAL WORLD people don't run benchmarks on their boxes all day. They run apps and processes, usually several at a time. That's why the Athlon's cache is superior--you can keep more in it instead of swapping to system RAM, which is a MUCH BIGGER HIT than having a small amount of cache latency. In the REAL WORLD, the Athlon's cache architecture makes sense, not in your fantasy where we all run CPUmark all day.
Uhhuh. That's why the Katmai P3--with its half-speed 512 Kb L2--was so much faster than the Coppermine? That's why the Athlon "Classic"--with down to 1/3-speed 512Kb L2--is so much better than TBird??
You think it's faster to perform a context switch with a 64-bit bus to L2 than a 256-bit one? Golly, imagine how slow the P4 with its 48 GB/s bus to the 5-cycle latency L2 will be!!
Furthermore, in case you'd forgotten, all these chips operate at over one billion cycles per second. Multitasking occurs at much higher granularity than this, and even if your analysis were right (it's not), the effects of multitasking are invisible to a chip to a second or third order of approximation. The effects of a 7 (or 5!) cycle L2 vs. an 11 cycle one most certainly are not.
And then you go on to say that the K8 will just be a derivative of the K7. You really aren't paying attention at all. The K8 Hammer architecture is completely new, not only extending x86 to true 64-bit while retaining backwards-compatibility with 32-bit and 16-bit code, but adding huge and significant architectural innovations. Go read about it, dimwit, before you guess at what it is. Lots of documentation has been released--even just a quick scan of some Slashdot search results will make you a lot more knowledgeable about it than you are now. Geez...
No, this is false. It is by now quite well known that what will define the Hammer family will be just a simple extension of the x86 ISA to 64 bits--an extension which will have essentially no use for the average PC user, but rather only for those who need 64-bit integer precision (CAD, etc.) or >32-bit memory address space (database, etc.). In addition, "the K8"--that is, the Sledgehammer, aimed at the enterprise market--will feature 2-way CMP and AMD's new Southbridge standard, LDT. Ho-hum. Nice features (LDT has no place in the PC either, though), but nothing extraordinary, especially considering it's not due until early 2002. In addition, there has been mention (Sanders mentioned it in an interview) of another K8 variant called Clawhammer; speculation is that this is a PC version of the K8, although it's not known what, other than x86-64, will differentiate it from, say, Palomino.
If you actually believe the K8 represents an entirely new design, then it may be that your news was correct but just a little (about 12 months) late. The K8 was indeed scheduled to be a ground-up clean-sheet kick-butt design, but was radically scaled back by Sanders less than a year ago. Head Designer and impressive guy Atiq Raza quit around a year ago, following this decision, and the hopes of a truly innovative K8 went with him. Of course, evolution is often better than revolution in the MPU industry--eg. RDRAM. If it can manage to position itself against Itanium, the K8 might look very strong. (Of course, McKinley will be on the way by then, and it's considerably less of a joke.) But claiming that the K8 is a revolutionary new design is plain false.
As for the recent
Re: P4 moving into heavy volume in Q3 2001: God, you are a shameless, and dim-witted, Intel apologist, just as I suspected. Q3 2001?
How precisely does this make me an Intel apologist? Is it too early?? It is a known fact that Intel's roadmap moves the P4 solidly into the mainstream category in Q3 '01 with the introduction of the Northwood P4 on a
Is it too late?? The latest Intel roadmap shows the P4 moving to the upper end of the mainstream category in Q2, but I believe that to be a lie by Intel marketing, eager to cover up the fact that they essentially have no upper-mainstream product from now until Q3 2001, a hole in their product line a mile wide. (Am I still an Intel apologist?) Indeed, this is the reason I just bought AMD stock very recently, and have been encouraging my INTC-owning relatives to sell ever since, well as it turned out, just before the peak late this summer. And yes, like you too I am generally appalled by Intel's heavy-handed anti-consumer tactics--suing VIA and refusing to release a PC133 chipset in a lame attempt to force RDRAM down the industry's throat; paper launching the 1 GHz P3 6 months before even limited volume was available, the 700-850 MHz P3's before it around 3 months early, and the 1.13...oh the 1.13...all in a lame attempt to pretend the P6 could keep up with the Athlon; bribing Michael Dell with special pricing and all several dozen GHz P3s available this summer to spread libelous statements to the media in a lame attempt to disparage AMD's products; spreading IA-64 FUD in a lame though successful attempt to scare designers of competing RISC chips to delay (Compaq, HP) or eliminate (MIPS) their next-gen chips; keeping the Celeron FSB clocked at 66 MHz and "single-processor only" in a lame attempt to...be lame.
Don't worry, I dislike Intel plenty a lot. I cheer for AMD, and make no bones about it.
What bothers me, though, is that, having been on the Athlon bandwagon since summer 1999, when I first read analyses of how the K6's poor scaling was due to architecture not process quality, and how the better balanced K7 had the chance to scale even higher than the P3, I've seen how this position has gone from being contrarian, well-informed and far-sighted to the position of a growing mainstream of ill-informed buzzword-spouting reality-ignoring AMD fanboys. No, not you; the people I'm talking about are much much worse (and hence not nearly as able to fool
Plus, AMD's execution with the K7, while quite good, has been well short of the claims that I and many others were making for it over the past year. The benchmarks have been disappointing. There's only so much excitement you can get out of awesome benches in 3DSMax and ViewPerf before you notice those Q3 and Content Creation scores just aren't going to change. (Yes, I know CC is Intel-biased. Whatever.) Thunderbird in particular was a huge disappointment, offering gains on the order of 3-5% over Athlon Classic while the Coppermine P3 beat Katmai by 10-15% (it's that 64-bit vs. 256 bit L2 bus). MP has been MIA for months now. The K7 laptops are late as well; high power-consumption is the price you pay for unneeded FPUs.
Having read the Willamette articles I've now referred you to twice (the DeMone ones on RWT), having seen Paul defend his unorthodox position on the Ace's tech boards for months now, basically skewering even very well-informed arguments on the AMD side, I've gradually become convinced that the "web hardware community" is greatly underestimating the P4's performance. So have many people much more knowledgeable than me--including the formerly (and still, though less so, IMO) AMD-biased Johan.
I usually go around looking to argue with P4-bashers who seem intelligent and well-versed in the technology, because they give the most interesting arguments and are the most willing to learn. Unfortunately, I too often have to correct well-meaning but misleading posters like yourself, who ignorantly pass on the same-old wishful thinking and oversimple analysis as fact.
I like AMD. I really do. I want them to "win", inasmuch as I want anyone to. I really do want them to stay very very competitive, like they are now. (And to make me lots of money!) But I just don't think it's helping them, or helping the truth, to pretend that the K7, a largely derivative design, will be able to keep up with the radically innovative P4 for very long. And I don't think it's furthering the principles of beauty and elegence in design--which is what really interests me in this stuff anyways--to call an insightful and fair (I thought it strongly gave AMD the benefit of the doubt, BTW) analysis of the strengths and weaknesses of the P4 and Mustang designs "an ignorant fluffy rant", or whatever you said.
I won't expect the apology from you, but you have my email address if you should want to send it. Meanwhile, if you're really interested in MPU design, please read Paul's articles at RWT; they're fabulous and take everything to a whole new level. And if they must be anti-Intel, you can't do better than his Merced/Itanium articles, here, here, here and here.
Also you should check out the tech forum at Ace's, and the very AMD-biased but usually literate and often a great site for news and links...JC's. Plus the usual suspects: Tom's, Ars, The Register for juicy-and-occasionally-even-true rumors. You could learn a lot, and trust me, it's fascinating stuff.
Tell me, what is so innovative about _kludging_ old crap to high speeds and adding more superfluous, useless instructions? I say nothing. Innovation is by definition "something new or different".
That's like looking at a new race car and saying it's no better than a Model T because it's just "kludging old crap to high speeds". Now, if it had 5 wheels, then you'd be getting somewhere! That's innovation!!
Or like saying any new computer is still the same old crap kludged to high speed because it's all binary. If only we switched to ternary logic, that'd be innovation!! Or because all architectures these days use 8-bit bytes. What's wrong with 10, or 37? The only reason we use 8-bit bytes is for backwards-compatability with extended ASCII, which is obsolete anyways! Why not move to 16-bit bytes with Unicode?! That'd be innovation! That's new!
Geez. It's not like the architects at Intel (or AMD, or Compaq, or anywhere for that matter) couldn't think up a new ISA in their sleep. It's just a dumb idea.
The Merced might've been an impressive chip, but why the f*sck do the still have keep dragging that 8086 shit behind?
Because the point of a MPU is to run programs, not to look beautiful on paper. The vast majority of the world's programs run only on x86; thus the vast majority of processor marketshare is going to be x86-compatible. Besides, if you actually knew and understood the amazing ways Intel (especially) engineers have managed to squeeze out performance despite working around the design constraints of x86-compatability, you wouldn't run around calling an amazing core like the P6 a "kludge". Indeed, you obviously don't realize this, but the most important ingredient in getting a chip to high clock speeds is an elegant balanced design. (Manufacturing process is a close 2nd, and is why the Alpha, a more elegant design, is outclocked by the PIII and Athlon.)
I think we should go more into parallelism in programs and take the advantage of multiple, perhaps a bit slower, processors, not one huge frying pan.
Well that's wonderful that you think that. Unfortunately, instruction-level parallelism is very very difficult to extract from most computer code. Furthermore, the increased complexity of SMP buses and SMP motherboards makes the SMP option too expensive to hit the mainstream market for the forseeable future.
Instead what we'll see is CMP--chip level multiprocessing, essentially having multiple chips on one die. Examples include the IBM POWER4 and prolly the AMD K8. This increases chip-to-chip bandwidth and gets rid of the motherboard costs, but doesn't solve the problem of low ILP in most code. A more interesting solution is SMT, simultaneous multithreading, which allows a single superscalar core to work on instructions from several different threads in parallel. Early indications are that SMT provides phenomenal performance boosts; Sun's embedded MAJC chips use SMT, but the first general purpose SMT MPU might be the Compaq Alpha EV8.
Rather, it is a piece of self-promotion by Ace's Hardware, who sent this story in themselves.
Many websites send notices of their original content to each other, especially when they know that it is excellent content, like this article. ArsTechnica sends notices both to Ace's and to
The article itself doesn't say anything the knowledgeable don't already know.
This is false. I am a hell of a lot more knowledgeable in matters of MPU architecture than you, and I learned quite a bit. But I suppose you were already an expert on the intricacies of load-store reordering on the P6 vs. the K7, on the precise weaknesses of the K7's branch prediction algorithm (i.e. that it throws an exception and flushes its BTB when presented with more than two branches in a 16-byte aligned code window), on the dependancy scheduling problems of very large instruction reorder buffers and what they imply about the P4's clock-speed ramp. I suppose you'd already seen benchmarks which measured the effects of L2 latency and branch prediction on IPC. (You wouldn't mind posting a link, would you troll?)
In fact, it reads like a high-school report, and not even a very well-written one. E.g., "First we will try to analyze the most important shortcomings, next we will search for possible solutions." Sounds just like the simplistic expositions of a high school term paper.
Way to go, asshole. The author's name is Johan De Gelas. He lives in the Netherlands. ENGLISH IS NOT HIS NATIVE LANGUAGE. I'd like to see you post a single sentence in Danish, much less an incredibly insightful article on competing philosophies in next-generation 1.5 GHz+ MPU design.
Look, I know that there is a lot of mumbo-jumbo laden "technical" architecture discussion going around the web, often quite nonsensical and written by good-old fashioned Americans who just haven't had the benefit of 8th grade grammar (or a solid education in MPU design). The point is, you were horribly wrong to lump this article in with that schlock, and you apparently did so only because it contained terms and explanations which you didn't understand. Furthermore, you made your point, with quite authoritative tone, in a public forum. Of course you have every right to be loud and wrong in
I repeat: the article is not a technical piece at all. Hannibal at ArsTechnica writes technical pieces about CPU design. This article at Ace's Hardware says nothing insightful.
Completely backwards. Now, let me first say that I not only respect Hannibal tremendously, but that his articles (particularly the excellent RISC vs. CISC in the Post-RISC era) were what inspired me, a bit over a year ago, to begin to learn much more about MPU architecture and design. They are written very vividly, with strong prose and excellent, clear analogies. They do a fabulous job of explaining complicated concepts and new trends in MPU design to a lay reader.
ArsTechnica, like
So by all means, people--if you're reading this and want to learn about the fascinating world of MPU design, start with Hannibal. But just know that his articles, while very good, are *not* technical; when you want technical, a great place to start is Ace's.
Now that we're through with that bit of unpleasantness, let's clean up your misstatements, shall we?
In fact, it misses the point. It dares to call the P4 "innovative" and wonder whether future designs in the x86 world will copy it. Well, of course not! How many times must it be said that the P4 barely keeps up with the Athlon and performs less well than a P!!!? Because, that is a fact. Numerous production samples have leaked, with the test results uniformly and without exception pointing to the fact that even if the platform's performance is improved by release time--which it should, since these are samples not a retail product--it won't outperform a P!!! with equal clockspeed. That's why the P4 is being released at 1.4 and 1.5GHz initially, because if they were released at 1.2GHz they'd be outperformed by the 1GHz P!!! and that wouldn't be good.
Oh really. Just like preproduction benchmarks of the K7 proved it to be "closer to that of a Celeron 366 than any Pentium III." Just like preproduction benchmarks of the PII lead to the following insightful comments from Tom's Hardware (a leader in the "P4 is overhyped, clock-speed isn't everything, blah blah blah" ignorance these days...):
Guess what: preproduction benchmarks are always wrong. Again, preproduction benchmarks are always wrong. And in particular, the benchmarks we've seen on those preproduction P4's are--just like the benchmarks included in the articles above (i.e. the K7 scoring only 60% of a clock-normalized PIII on FPUMark; the PII doing worse on 32-bit code than a P5-MMX)--utter nonsense given what we know about the P4's design . Thus the logical conclusion is that, just like the preproduction MPU's "benchmarked" above (and let me remind you that those were at least close enough to final silicon to be clocked at release-ready clock speeds), the P4's we have seen "benchmarked" on the web so far have been sandbagged.
Now, the common reaction to these charges goes something like this: "Sandbagged? Impossible! After all, these P4's are at most one stepping from final silicon, maybe even final silicon! Thus they can't be sandbagged!" Which is utterly false. Obviously the sandbagging isn't done in the chip design--that would be idiotic. Rather, it is done in microcode. Every feature of the chip can be turned on and off, tuned and detuned, in microcode. Thus it is trivial to ship a preproduction MPU off for validation with, for example, part of the L2 cache disabled, or the BTB or instruction reorder buffers set to flush when they don't need to, or the way prediction on the two-cycle L1 cache turned off, or tuned wrong, or with certain x86 instructions mapped to unnecessarily slow circuit paths, or any of dozens and dozens of different things set wrong. Indeed, this is the common state of internal preproduction MPUs, because the only way to test corner cases and pathological cases is by disabling one part of the chip and thus placing unrealistic stress on another. In other words, preproduction chips are sort of like beta software--full of DEBUG code which slows everything down, but isn't worth taking out until you're sure everything works.
"But," you may say, "why would Intel sandbag their preproduction P4's when they know benchmarks will leak out?? Why not build up the hype and all that??" The answer, again, is simple. If you take a look at Intel's history of dealing with prerelease cores, you find that they only hype the projects which are likely to underperform horribly--the i860, the iAPX432, Itanium--and they significantly underplay the ones which are going to kick major booty--eg. the P6 core and now the P4. "But why???" Easy. If Intel has a project which sucks, the best they can hope for is to scare off their potential competitors from the market space until they can get another crack at it. (Remember, there's a 3-or-more year lag-time between the decision to start--or not start--a project and the finished product.) That's exactly what they've done with Itanium, scaring MIPS out of the high-end RISC business, and putting Compaq and HP years behind on their high-end RISC designs, with nothing but a bunch of IA-64 FUD. Meanwhile, if their upcoming core is going to perform incredibly, why waste time hyping and giving your competitors the tip-off?? All that would do is cannibalize the sales of your current MPUs as people wait to get the amazing new chip due out in 6 months. Worse, if Intel hyped the great performance of the upcoming P4, they would need to admit that the average PC user can actually use 1 GHz+ performance...which, of course, would play right into the hands of AMD which is the only player with decent 1GHz+ volume until well into next year. This way, you get to surprise the industry, get great press, and sell off way more of your old, now obsolete chips. Simple, really.
Now, the P4 barely keeps up with the current-generation Athlon Thunderbirds. This is important to note because people always *blamed* AMD for a processor which still, with the advantages of the P!!! SIMD intruction optimizations used in much software, didn't quite keep pace with Intel's offering in the most common benchmarks. Now, the technically knowledgeable know that the Athlon whomps the P!!! in anything that isn't SIMDified, and that its floating point unit is head-and-shoulders above. But people still moaned about the performance gap in certain common SIMDified benchmarks.
Wrong, wrong, wrong. The only cases in which the Athlon clearly bests a Coppermine P3 is in scientific (i.e. double-precision) FPU-heavy simulations, ray tracing, etc. On almost every other benchmark, they are within +/-5% at identical clock speeds, with a few standouts at around +/-8% for each architecture. In particular, 3D games tend to show an affinity for the Coppermine. Blaming this on some "SIMD bogeyman" is ridiculous--every 3D game, and especially a standout game like Quake 3, is optimized for 3DNow just as it is for SSE. Now, you can either deny the facts, or you can try to understand them.
The main culprit, of course, is the difference in L2 latencies. Tbird has a 64-bit bus to L2 at a latency of 11 clock cycles, with 384Kb total cache; Coppermine has a 256-bit bus to L2 at a latency of 7 clock cycles with 256Kb total cache. The Tbird has the bigger cache because the cache design is exclusive; however, it also has much longer latencies for this and other reasons. In the end, there is no comparison as to which is the better design--the Coppermine's cache hierarchy is simply better than the TBird's, no argument about it. And Johan's benchmarks illustrate this rather nicely.
Well, here's what they didn't realize: the Athlon is a truly seventh-generation core--which beat Intel to the punch by, what, almost a year and a half? As such, it has made trade-offs to be able to scale to higher clockspeeds better--one reason why Intel had to recall, and still hasn't re-issued, the 1.13GHz P!!! yet AMD are easily churning out 1.2GHz Athlon Thunderbirds.
"The Athlon is truly a seventh-generation core." What does that mean??? If you think it means the K7 core has one single architectural innovation which does not exist on an MPU available before it, then I challenge you to list it now. (Indeed, I can't think of a single innovation in the K7 which isn't in the P6 core--except for the exclusive cache architecture, which is an overall weakness compared to the Coppermine cache--but there may be some.) If you think it means the K7 is a better core than the P6, well, you're right. The K7 is indeed a better core, in that its pipeline stages are more evenly balanced, and thus it can scale to higher clockspeeds on similar process. On the other hand, the K7 is less well balanced from an execution resources standpoint, including such oafish features as a fully 3-wide FPU (as opposed to the P6's 1.5-wide FPU), which offers at best 40% better performance, but generally no better performance than the P6 on FP intensive apps. Yes, the reason for the discrepancy is partly due to code which is compiled with the P6's execution resources in mind--but of course, that will continue to be most things so long as Intel has the majority of market share (AMD currently sells out all the MPUs it can make and thus has no theoretical way of getting majority market share for at least the next 4 years or so), and most apps are precompiled binary. But it's partly due to the fact that there's just not enough need for 3 full FPUs to justify the die space they take. This is just one example, but the end result is that the K7 is a well-balanced core pipeline-wise which is larger and consumes more power than it can justify based on its ability to get instructions from cache and memory. It is still the fastest thing out there, but it uses brute force to make it there. Time-to-market issues are behind some of these design issues, and some of those will be solved with the upcoming Mustang/Palomino/Morgan core tweak. But that still won't make the K7 anything more than a rebalanced tweaked-out brute-force of a P6. And hey--that ain't bad. But it ain't innovation.
The P4, on the other hand, includes many features never before seen on a commercial MPU. They include: double-pumped ALU, integer decoder and scheduler, and integer retiring (running at up to 4 GHz on a
It is, all-in-all, a very impressive looking chip, more than worthy of the title "seventh generation", whether it turns out to perform well or poorly. However, meaningless sandbagged benchmarks aside, all indications are that it will perform magnificantly. Taken as a whole, the P4 contains not only the sorts of design changes necessary to *double* clock speed on a given process over the P6 (note:WOW), but also *increase* IPC. But we'll see how this beautiful looking design translates to reality when the first actual P4's are released and benchmarked.
Blah blah blah, biased statements towards Ace's.
Ace's is in general a slightly AMD-biased site. "Unfortunately", Johan, Brian, and the rest of the crew there "have to" read the thoughts of actual MPU experts day in and day out in their technical forum, and thus know that the case for the K7--and against the P4--is not what the average hardware site has made it out to be. This is not to take anything away from AMD, which has at the moment by far and away the fastest performing MPUs on the planet, the best binsplits on the planet, and about 1.4x the performance/price of Intel all the way up and down their price lists. However, all appearances are that, once the P4 moves into heavy volume production (note: not until Q3 next year at the earliest, after a process shrink to