AMD Talks About Internal Benchmarks for Opterons
ggruschow writes "AMD's CTO says their 2.0-Ghz Opteron (aka Hammer) beat a 2.8-Ghz Xeon (P4) on both SPECint2000 and SPECfp2000 tests, but was mixed against an Intel 1-Ghz Itanium 2 (details at
ExtremeTech). IBM predicted "conservative" 1.8-Ghz PowerPC 970 scores, which fall in the middle of the pack (sweet for OS X). It's probably not a coincidence that AMD's news comes so soon after Gartner said x86-64 would fail. Even if Intel loses the performance crown again, their upcoming mobile processor is looking pretty spiff with its recently announced 1MB of cache. Sounds like next year might finally bring a worthy upgrade for my 486dx4-160."
You're weak my friend ;)
You've got no holding power... hell i've still got my Commodore 64 with accoustic coupler modem, and i'll hold onto it until I see something worth spending money on...
Be you Admins? nay, we are but lusers!
For straight CPU intensive tasks it matters.
But for 99% of normal peoples taskes 10% whont matter.
But it's the edge and it has to be somewhere and it has to move.
My rule is that I upgrade when I can get a cpu that is twice as fast as my old one for about 1000dkr (130$/).
Thats possible right now (I've a 850Mh celeron), but I need a new motherboard, which kind of changes the rules.
TC - My Photos..
Benchmarks are as bad as statistics. They measure nothing but how much you can tweak your CPU and compiler to fit that specific benchmark.
I would say that AMD may have an advantage for being more backwards compatible than Itanium, but I also feel that it is time for a change!
All major CPU manufacturers make proper RISC CPU already so why don't we find them in our ordinary computers? It is because the Windows codebase cannot simply be recompiled for a new target but has to be ported function by function (painful assignment, to say the least). Perhaps they can reuse 3/4 of the code, but still, there is a whole lot or rewriting and verification to do.
I have worked in a Tru64 environment (running Alpha CPUs) and I was surprised of how easy it was to get 95% of the Linux apps to properly compile and run. I didn't try to get Linux it self running but I had gcc running and that was enough.
What I'm trying to say is that the open source movement has proven that one can write portable code successfully and that it is time to make a hardware change. The serial ATA and AGP solutions from the PC are good enough, so is the PCI bus (lots of peripihals available) so I wouldn't change that, but simply make the standard computer run multiple RISC CPUs and a proper multi-threaded OS that can take advantage of that and then you'll have a performance boost that would make P4 look like a bicycle compared to a F1 car (ok, perhaps a Porche, but still, an F1 does 0-200kph in
While I'm at the subject. As we have bochs, it would still be possible to run Windows in a VM, no matter what platform we use, so all M$ users could be happy, or do as ACorn did (does), have a PC as a extension card, i.e. run a PC natively in a window, just used the *fast* RISC CPU for any real work.
Clawhammer (Athlon) has a single 16 bit wide hyper transport bus.
. pp t
M D- Opteron.ppt
:)
The workstation Sledgehammer (Opteron) has two 16 bit busses
The server Sledgehammer (Opteron) has three 16 bit busses
The spec results are as follows:
Spec_int
PIII1G 426
G4 1ghz 306
G5 937 (IBM PowerPC 970)
2.8Ghz p4 1010
XP 2800 933
Itanium 1Ghz 810
Power4 1300 804
Clawhammer 2.0 Ghz 1202
Spec_fp
PII 1Ghz 426
G4 1Ghz 187
2.8 Ghz p4 947
XP 2800 782
Itanium 1Ghz 1356
Power4 1300 1169
Clawhammer 2.0Ghz 1170
Opteron??? Higher than clawhammer considering the multiple hyper transport busses 1/2 mb L2 (compared to clawhammer's 256/512 l2) and dual on chip DDR memory controllers compared to Clawhammers single memory controller
Bootleg Powerpoint Presentation:
http://130.236.229.26/download/misc/AMD-Opteron
and
http://a26.lambo.student.liu.se/download/misc/A
Read the Show notes! AMD failed to edit them out
Filename is AMD-Opteron.ppt google search it.
Includes a system that is an Opteron workstation dualed with a clawhammer that still presents itself as a single proc system. The clawhammer acts as a math co-processer
If voting were effective, it would be illegal by now.
Your konwledge isn't that good. The fastest 486 in terms of Mhz was the Amd 5x86 - 133Mhz (4*33Mhz) chip. That chip easily overclocked to 160Mhz (4*40Mhz). In terms of pentium performance (integer wise) it was equivalent of a P75 at 133Mhz and of a P90 at 160Mhz (give or take a few percent)
In terms of performance the fastest chip that fitted in a socket 3 was the Cyrix 5x86 120Mhz, which (again speaking of integer performance) was equivalent of a P100.
Thomas S. Iversen
I hope THIS mask rev of Opteron (Hammer) chip will be faster than January 2002 PowerPC G4 chips.
Currently, according to the RC5 benchmarks AMD is far slower than dual cpu macintoshes (half as fast). (source available for cor rc5 loops for most processors). RC5 was silently completed in June or so but a bug went unnoticed for a couple months, but the contest is over. They measured performance in units of "Mac poerbooks" in their press releases.
The Mac Dual 1 Ghz g4 is faster than all existing dual AMD motherboards in RC5 benchmark by almost 100%.
21,129,654 RC5 keyrate for dual 1 Ghz g4 system ! And Now apple sells dual 1.25 Ghz stock which would be even faster.
A dual 1800+ AMD MP gets only HALF as many as a Mac! 10,807,034 rc5 keys !
Funny "Mhz myth" there showing itself I guess... Apple now is selling even FASTER machines but with smaller caches and less fast read-write ram (it now uses DDR on newest boxes).
And the macs are using low power g4 chips meant for microcontroller usages with very little predictive branching and a simple 7 stage RISC pipeline depth. (macs complete many many instructions per cycle though, unlike Pentiums).
The mac I mentioned uses a 2 MB L3 cache and no AMD MP dual cpu boards I know about have any L3 cache at all, so maybe that is whay some common macs are over twice as fast, its not just altivec meager tweaks to rc5. AMS have similar , but less mazing vector ops.
Another reason the mac might be over twice as fast as an amd dual mp board is not just the 2MB l3 cache but the fact that mac can read and write to a cold page of memory simulatneously FASTER than any AMD MP designs which are biased for linear access and streaming. Many memory scatter benchmarks show this too. Appels newest DDR-RAM machines might not offer this feature though.
So basically, will the new Hammer systems be able to get close to speed for RC5 and other crypto tasks as the RISC based Powerpcs?
I really want to know. And I am so sad to see Slashdot reduced to fanboys modding down anything discussing tech subjects like this as "flames" all the damned time. This post is all informatinve and factual and my reason for asking is genuine.
http://www.research.ibm.com/journal/rd46-1.html has 5 LARGE technical articles on how the POWER4 chip was designed... in PDF form too. Even if you do not appreciate the Power4 (which apple is using a dual-core version of in many months) you might want to read these PDFs because they are all about chip design.
They put the floating point on the corners of the chip die to help spread heat, etc. Hundreds of interesting facts and pictures on at that site.
Top500.org lists Power3 dominating the cluster speeds of the top 500 computer clusters for memory+float speed. Power4 will soon start appearing in that list as well as the "lite" version with only 2 MB of cache instead of 4,6, and 16 MB.
Plus the new chip apple will start using announced yesterday, will have SIMD "VMX" or Velocity Engine added (Moto calls theirs"altivec").... only 90% of altivecs hundreds of opcodes will be offerred though.
With Pricewatch showing cheapest 800Mhz Itanium bare cpu at almost 8 THOUSAND dollars, and 3.5 thousand for the old itanium 700 Mhz, it does not take a financial genius to see why apple's workstations are selling so well nowadays.
For straight CPU intensive tasks it matters.
But for 99% of normal peoples taskes 10% whont matter.
10% never matters. We regularly run simulations here that take a month. What is 10% on top of a month? 3 days. If you have already been waiting 30 days, what does another 3 matter? It probably corresponds to the weekend anyway.....
How many people are "we"?
;)
If you are ten people, one of them could be fired, by your argument, without anybody noticing.
Let me turn it around - how many procent do you need before it matters? 12? 15?
But I agree, one can't upgrade everytime theres a 10% speed increase. One has to do the cost/benefit thing carefully first (and then ignore the c/b and just spend, spend, spend - the only way to get the economy back on track
TC - My Photos..
1) It's mostly written in c/c++
2) The HAL (Harware Abstraction Layer) contains most of the platform specific code. As I understand it the kernel does not actually handle the hardware directly
Ofcourse I can see it going like this:
1) Apple, Intel, AMD and Moterola put forward new Chip designs
2) They ask MS to support it with their OS
3) MS picks Intel
--
$vi any_article_on_iraq
:s/iraq/microsoft/gi
:s/Weapons of mass destruction/Windows/gi
:s/Axis of evil/Redmond/gi
:s/In this post september 11 climate/Service Pack 1/gi
:s/Bush/Linux/gi
:wq
486dx4-160? No wonder you crazy linux folks hate windows. You haven't bought a computer since 1995.
At work I've got a 49000 line Microsoft Visual C++ project that compiles in 5.5 minutes on a 1700 MHz Pentium 4. That's right, about 150 lines per second.
Turbo Pascal used to compile at thousands of lines per second on machines with a clock nearly two orders of magnitude slower that tool several cycles per instruction instead of running several instructions per cycle.
Before you say something like "hey, but moderns compilers have optimizations yadda yadda" perhaps I should mention that this compilation time was with no optimizations and features like updating browser files disabled. With optimization it's even slower.
We're talking about four orders of magnitude difference in efficiency here. It's not all the compiler's fault, of course. The libraries and code use complex templates and multiple levels of definitions that make the compiler work much harder.
At each one of these layers someone probably said "It's OK if this is 10 times slower. It's easier to write and maintain, I'm more productive (or lazy) and the CPU is fast enough". Each one of these decisions may be justified *in itself* but they add up (or rather multiply up) to a 1/10000 difference in efficiency. Slowing the edit/compile/debug cycle reduces programmer productivity and code quality. Reduced code quality to more code bloat and even slower edit/compile/debug cycle and so on.
Damn, it's depressing.
Stop worrying about the risks of nuclear power and start worrying about the risks of not using nuclear power.
I don't pretend to feel the difference between 2.0GHz and 2.1GHz. I don't "feel the difference" when going from a HD with 3x20gb platters to 2x30gb platters. I don't feel the difference between PC3200 and PC2700.
;) ) price, I do need to know what the state of the art is.
But I do feel it when I upgrade from an outdated system to a new one. And to know what kind of performance I could get for a reasonable* (*as defined by me
Maybe that isn't relevant to you, maybe your 486 / Pentium / Duron / Space heater does what you want it to when you check your email and type up your word document, but not for all of us. I know a few tasks where I'd like 4gb+ of memory, solid-state SATA drive and a multi-GHz proc+, or a dual, for that matter.
Large strides are best made one small step at a time. This is just another one of them.
Kjella
Live today, because you never know what tomorrow brings
Yeah, but only in the way than no-one NEEDs modern medicine, central heating, or citrus fruit during the winter.
On the other hand, I NEED faster than a Duron/600 for:
sending messages in ICQ (yup, sending a message is O(n) or O(n^2) - not sure which) with n the number of messages in your scrollback
Encoding MP3s - I spent over 2 hours this afternoon switching CDs every 10-15 minutes.
Recording TV - I can only record to divx at quarter VGA or less
Using Mozilla the way I want (with 20-50 tabs open at a time and 128M of RAM cache)
Using an encrypted filesystem (unless win2k's implementation is just horribly inefficient)
Opening / manipulating 500M images
Sure, I could plop an XP2200+ in here, but I spent $50 on the original CPU and I'm unwilling to spend more on another until Hammer comes out. A dual Clawhammer should be about 10-20x as fast as my current machine depending on app - a most satisfying upgrade.
High-speed Road Trip (18.000KPH)