AMD Talks About Internal Benchmarks for Opterons
ggruschow writes "AMD's CTO says their 2.0-Ghz Opteron (aka Hammer) beat a 2.8-Ghz Xeon (P4) on both SPECint2000 and SPECfp2000 tests, but was mixed against an Intel 1-Ghz Itanium 2 (details at
ExtremeTech). IBM predicted "conservative" 1.8-Ghz PowerPC 970 scores, which fall in the middle of the pack (sweet for OS X). It's probably not a coincidence that AMD's news comes so soon after Gartner said x86-64 would fail. Even if Intel loses the performance crown again, their upcoming mobile processor is looking pretty spiff with its recently announced 1MB of cache. Sounds like next year might finally bring a worthy upgrade for my 486dx4-160."
Who cares what processor is slightly slower or faster than others? You need at least a 10% difference in overall system performance to notice anyway.
Darn, I missed fp by thinking...
You're weak my friend ;)
You've got no holding power... hell i've still got my Commodore 64 with accoustic coupler modem, and i'll hold onto it until I see something worth spending money on...
Be you Admins? nay, we are but lusers!
Benchmarks are nice and all, but I'm getting kinda tired about hearing how great a CPU benches for about 6 months before I could even buy one with a sack full of money.
Not that I'm not excited about 64bit CPUs on the desktop, I could really find a use for one (I've got something interesting that likes to malloc more than 4GB sometimes).
Introducing the new Occam Fusion! Now with sqrt(-1) fewer blades!
I want lots of cache and extreme memory bandwidth. As CPUs are getting faster and faster, both the lack of cache and memory access are seriously limiting the performance of current PC architectures. Yet, not even Intel seems to be interested in improving those areas. In fact, with P4 Intel actually cut the amount of cache.
Damn, I tried to mod "funny" and it entered it as "overrated". Stupid wheel mice.
This
Noone cares about few % performance gain anymore. And even if Opteron would be much faster, people wouldn't care much simply because you can't buy it. Pentium4 is better because you can get it *NOW*
If you need new computer, buy it (NOW!), otherwise don't buy anything until you need it.
Oviously there is a market for super-fast processors to those of us on /., but aren't we at a point where currently available processors are fast enough for more and more user segments? What I mean is, people who do Word and Excel were happy along about 800 MHz and ordinary CAD people like me don't need more than about 2 gig. There are only two guys in my organization (running VHDL simulations day in and day out) who have any need for faster processors. Will we soon get to a point where the total market size of gamers and /. people will not pay for another processor spin?
Benchmarks are as bad as statistics. They measure nothing but how much you can tweak your CPU and compiler to fit that specific benchmark.
I would say that AMD may have an advantage for being more backwards compatible than Itanium, but I also feel that it is time for a change!
All major CPU manufacturers make proper RISC CPU already so why don't we find them in our ordinary computers? It is because the Windows codebase cannot simply be recompiled for a new target but has to be ported function by function (painful assignment, to say the least). Perhaps they can reuse 3/4 of the code, but still, there is a whole lot or rewriting and verification to do.
I have worked in a Tru64 environment (running Alpha CPUs) and I was surprised of how easy it was to get 95% of the Linux apps to properly compile and run. I didn't try to get Linux it self running but I had gcc running and that was enough.
What I'm trying to say is that the open source movement has proven that one can write portable code successfully and that it is time to make a hardware change. The serial ATA and AGP solutions from the PC are good enough, so is the PCI bus (lots of peripihals available) so I wouldn't change that, but simply make the standard computer run multiple RISC CPUs and a proper multi-threaded OS that can take advantage of that and then you'll have a performance boost that would make P4 look like a bicycle compared to a F1 car (ok, perhaps a Porche, but still, an F1 does 0-200kph in
While I'm at the subject. As we have bochs, it would still be possible to run Windows in a VM, no matter what platform we use, so all M$ users could be happy, or do as ACorn did (does), have a PC as a extension card, i.e. run a PC natively in a window, just used the *fast* RISC CPU for any real work.
Clawhammer (Athlon) has a single 16 bit wide hyper transport bus.
. pp t
M D- Opteron.ppt
:)
The workstation Sledgehammer (Opteron) has two 16 bit busses
The server Sledgehammer (Opteron) has three 16 bit busses
The spec results are as follows:
Spec_int
PIII1G 426
G4 1ghz 306
G5 937 (IBM PowerPC 970)
2.8Ghz p4 1010
XP 2800 933
Itanium 1Ghz 810
Power4 1300 804
Clawhammer 2.0 Ghz 1202
Spec_fp
PII 1Ghz 426
G4 1Ghz 187
2.8 Ghz p4 947
XP 2800 782
Itanium 1Ghz 1356
Power4 1300 1169
Clawhammer 2.0Ghz 1170
Opteron??? Higher than clawhammer considering the multiple hyper transport busses 1/2 mb L2 (compared to clawhammer's 256/512 l2) and dual on chip DDR memory controllers compared to Clawhammers single memory controller
Bootleg Powerpoint Presentation:
http://130.236.229.26/download/misc/AMD-Opteron
and
http://a26.lambo.student.liu.se/download/misc/A
Read the Show notes! AMD failed to edit them out
Filename is AMD-Opteron.ppt google search it.
Includes a system that is an Opteron workstation dualed with a clawhammer that still presents itself as a single proc system. The clawhammer acts as a math co-processer
If voting were effective, it would be illegal by now.
Your konwledge isn't that good. The fastest 486 in terms of Mhz was the Amd 5x86 - 133Mhz (4*33Mhz) chip. That chip easily overclocked to 160Mhz (4*40Mhz). In terms of pentium performance (integer wise) it was equivalent of a P75 at 133Mhz and of a P90 at 160Mhz (give or take a few percent)
In terms of performance the fastest chip that fitted in a socket 3 was the Cyrix 5x86 120Mhz, which (again speaking of integer performance) was equivalent of a P100.
Thomas S. Iversen
I hope THIS mask rev of Opteron (Hammer) chip will be faster than January 2002 PowerPC G4 chips.
Currently, according to the RC5 benchmarks AMD is far slower than dual cpu macintoshes (half as fast). (source available for cor rc5 loops for most processors). RC5 was silently completed in June or so but a bug went unnoticed for a couple months, but the contest is over. They measured performance in units of "Mac poerbooks" in their press releases.
The Mac Dual 1 Ghz g4 is faster than all existing dual AMD motherboards in RC5 benchmark by almost 100%.
21,129,654 RC5 keyrate for dual 1 Ghz g4 system ! And Now apple sells dual 1.25 Ghz stock which would be even faster.
A dual 1800+ AMD MP gets only HALF as many as a Mac! 10,807,034 rc5 keys !
Funny "Mhz myth" there showing itself I guess... Apple now is selling even FASTER machines but with smaller caches and less fast read-write ram (it now uses DDR on newest boxes).
And the macs are using low power g4 chips meant for microcontroller usages with very little predictive branching and a simple 7 stage RISC pipeline depth. (macs complete many many instructions per cycle though, unlike Pentiums).
The mac I mentioned uses a 2 MB L3 cache and no AMD MP dual cpu boards I know about have any L3 cache at all, so maybe that is whay some common macs are over twice as fast, its not just altivec meager tweaks to rc5. AMS have similar , but less mazing vector ops.
Another reason the mac might be over twice as fast as an amd dual mp board is not just the 2MB l3 cache but the fact that mac can read and write to a cold page of memory simulatneously FASTER than any AMD MP designs which are biased for linear access and streaming. Many memory scatter benchmarks show this too. Appels newest DDR-RAM machines might not offer this feature though.
So basically, will the new Hammer systems be able to get close to speed for RC5 and other crypto tasks as the RISC based Powerpcs?
I really want to know. And I am so sad to see Slashdot reduced to fanboys modding down anything discussing tech subjects like this as "flames" all the damned time. This post is all informatinve and factual and my reason for asking is genuine.
http://www.research.ibm.com/journal/rd46-1.html has 5 LARGE technical articles on how the POWER4 chip was designed... in PDF form too. Even if you do not appreciate the Power4 (which apple is using a dual-core version of in many months) you might want to read these PDFs because they are all about chip design.
They put the floating point on the corners of the chip die to help spread heat, etc. Hundreds of interesting facts and pictures on at that site.
Top500.org lists Power3 dominating the cluster speeds of the top 500 computer clusters for memory+float speed. Power4 will soon start appearing in that list as well as the "lite" version with only 2 MB of cache instead of 4,6, and 16 MB.
Plus the new chip apple will start using announced yesterday, will have SIMD "VMX" or Velocity Engine added (Moto calls theirs"altivec").... only 90% of altivecs hundreds of opcodes will be offerred though.
With Pricewatch showing cheapest 800Mhz Itanium bare cpu at almost 8 THOUSAND dollars, and 3.5 thousand for the old itanium 700 Mhz, it does not take a financial genius to see why apple's workstations are selling so well nowadays.
Hey, he can probably even run Quake 1 with a decent framerate on minimum window size.
I did.
Beware: In C++, your friends can see your privates!
Pentium 4s have no shared cache. uni-processor designs only.
If you want DUAL cpus, or more, you have to go mac or AMD to get speed per dollar.
and macs are twice as fast as the fastest AMD for rc5 benchmarks.
a pentium 4 is a heatwasting joke once you start using 2 or more cpus.
Apple is only selling dual cpu machines now. And when the dual core Power4 ships in 8 months or less, they mught be offereing 4 cpus economically as a stock product, even if they do not, many 3rd party dual cpu board suppliers for macs exist, such as Sonnet Technologies.
WRONG! RISC "ordinary computers" exist!
You wrote "why don't we find them in our ordinary computers"!
In fact I am using one as I type this. It was built in 1996 (yes nineteen ninety six) and has a 800 Mhz G4 accelerator in it from Sonnet.
Its my "internet" machine, I use other RISC machines for programming not wired to any external networks.
It runs a wonderful version of Microsoft Office at full speed (RISC) and launches MS word in 2 seconds cold. (yes two seconds to flashing cursor).
no intel emulation needed.
its called a Macintosh
millions of macs exist and millions of macs use one or more risc processors and almost no mac people I know ever wnat to emulate a pc running windows EVER if they can help it.
RC5 and other benchmarks are twice as fast on standard macs than AMD, and Pentium 4s have no multi-cpu board designs...
If you want to run thousands of high end commercial shrink wrapped products in RISC you can, but only on macintosh. And they run very well in the new Jaguar 10.2 (though faster in 8.6).
1) It's mostly written in c/c++
2) The HAL (Harware Abstraction Layer) contains most of the platform specific code. As I understand it the kernel does not actually handle the hardware directly
Ofcourse I can see it going like this:
1) Apple, Intel, AMD and Moterola put forward new Chip designs
2) They ask MS to support it with their OS
3) MS picks Intel
--
$vi any_article_on_iraq
:s/iraq/microsoft/gi
:s/Weapons of mass destruction/Windows/gi
:s/Axis of evil/Redmond/gi
:s/In this post september 11 climate/Service Pack 1/gi
:s/Bush/Linux/gi
:wq
I got one of those 133mhz chips to go upto 180mhz on a tomato board (pci :-) by trying different jumper configurations. Stable? Wouldn't know, didn't run it long enough.
What we see depends on mainly what we look for. -- John Lubbock Now search for that bug slave!
486dx4-160? No wonder you crazy linux folks hate windows. You haven't bought a computer since 1995.
Palladium would have been Microsoft's price for x86-64 Windows. If MS develop that OS, then end users are going to see what Hammer can really do. If they don't, and everyone just runs XP in 32-bit mode, then all you have is a fast Athlon.
Early models will be able to deactivate Pd, anyway. When it becomes hardwired, that's the day I start looking at Apple and ARM.
Real Daleks don't climb stairs - they level the building.
I think the point of getting more powerful processors is not just for everyday use, but increasing the overall computing power in the world. Imagine getting back the results from Folding@Home in a week, rather than a couple years... sequencing genomes etc... There are very valid purposes for computationally powerful machines, just because WE don't know of any (in our daily lives), doesn't mean that there aren't any (hehe, agnostic argument).
:-) Good enough reason for me.
If someone were to say to me, that the number of kids on computers today doing the things they do was not directly related to computational power, I wouldn't believe them. The more power, the further the abstraction from what computers really are underneath, hence the broader user base.
If my old computer that my mom uses were 100x as powerful, it would be smart enough to go look online as to why it's having errors printing, and I'd never have to venture out of my cave in the basement
At work I've got a 49000 line Microsoft Visual C++ project that compiles in 5.5 minutes on a 1700 MHz Pentium 4. That's right, about 150 lines per second.
Turbo Pascal used to compile at thousands of lines per second on machines with a clock nearly two orders of magnitude slower that tool several cycles per instruction instead of running several instructions per cycle.
Before you say something like "hey, but moderns compilers have optimizations yadda yadda" perhaps I should mention that this compilation time was with no optimizations and features like updating browser files disabled. With optimization it's even slower.
We're talking about four orders of magnitude difference in efficiency here. It's not all the compiler's fault, of course. The libraries and code use complex templates and multiple levels of definitions that make the compiler work much harder.
At each one of these layers someone probably said "It's OK if this is 10 times slower. It's easier to write and maintain, I'm more productive (or lazy) and the CPU is fast enough". Each one of these decisions may be justified *in itself* but they add up (or rather multiply up) to a 1/10000 difference in efficiency. Slowing the edit/compile/debug cycle reduces programmer productivity and code quality. Reduced code quality to more code bloat and even slower edit/compile/debug cycle and so on.
Damn, it's depressing.
Stop worrying about the risks of nuclear power and start worrying about the risks of not using nuclear power.
I don't pretend to feel the difference between 2.0GHz and 2.1GHz. I don't "feel the difference" when going from a HD with 3x20gb platters to 2x30gb platters. I don't feel the difference between PC3200 and PC2700.
;) ) price, I do need to know what the state of the art is.
But I do feel it when I upgrade from an outdated system to a new one. And to know what kind of performance I could get for a reasonable* (*as defined by me
Maybe that isn't relevant to you, maybe your 486 / Pentium / Duron / Space heater does what you want it to when you check your email and type up your word document, but not for all of us. I know a few tasks where I'd like 4gb+ of memory, solid-state SATA drive and a multi-GHz proc+, or a dual, for that matter.
Large strides are best made one small step at a time. This is just another one of them.
Kjella
Live today, because you never know what tomorrow brings
It's just mind boggling that people take them seriously...
Sticking feathers up your butt does not make you a chicken - Tyler Durden
I won't argue about a change from X86 being desirable either but....
IMHO Itanium just isn't the way to go. By some measure if X86 is warty, then Itanium most closely resembles Ben Grimm in his best orange. By other measures perhaps IA64 is a cleaner architecture, but it's proving to be a sonofagun to write compilers for. To me that portends a somewhat moribund future with a highly complex compiler on a highly complex architecture. Even incremental improvements, other than clock speed and cache size ramping will be difficult.
The living have better things to do than to continue hating the dead.
I think a better approach for the future are smaller less power hungry modular CPUs. We've all seen the evidence of the clusters that makeup super computers. What if all standard computers came with 4 CPUs that used the same power as the P4 today? What if, instead of buying a newer faster computer, you could add CPUs like expansion cards but, at a reasonable price?
UNIX/Linux Consulting
Back when Pascal was prevalent...wait that never happened.
Anyway, twenty years ago people didn't write thing modularly like they do today so recompiles were of a bigger piece of the project.
Now we use modularity, so code is broken up into much smaller pieces. A recompile need only be the file you're working on - the other 50 of them can just stay compiled as they are. Obviously 'make' was developed specifically to optimize the decision of what needs to be recompiled.
Sure, it is much, much slower. But linking takes very little time, and compile time has been cut way down by previous compiles - almost enough to make up the difference (although, I admit, not quite). Still, you're comparison is not the best - Pascal hardly has the powers available to a bigger programming language, and since its only been academic, not as much effort has been placed in making the compiler really smart (and therefore slower). Perhaps you should talk about Fortran '77?
Mod me down and I will become more powerful than you can possibly imagine!
i think the new release of hammer lines will be very difficult for amd. intel is one step ahead. if you see right now, they are already announcing next generation product lines in all fronts. like banias in cpu, ultra low voltage and integrated chips for small devices, extremely high speed chips for network devices.
i believe intel has shifted its focus in the battle of the desktop cpus. while amd is just playing catch up, intel now is already looking at what consumers will benefit from. maybe intel has realized that the speed today is an overkill for majority of today's needs. they are just speeding up their chips to keep up with moore's law.
but look at their products, right now, they are focusing on making things smaller, lightweight, ultra low power consumption, low heat devices, integration. the future is not on desktop computers requiring very high speed cpu but mobile devices such as phones, pda, tablets, etc. intel will be a clear winner (if only i have humongous money so i can buy intel stocks at discount.)
they have good engineers that produce good results. right now, they are already producing better chipsets for their server product lines, maybe a few years, they will no longer rely on broadcom's serverworks.
they are also picking up on their storage chips. from all the raid controllers in the market, i hardly see a card that does not have an intel 960 i2o processor or their new ixp processors.
their network and communication is very dynamic. like introducing 10gigabit products today (even with the downturn of telecoms.) enabling encryption and decription at 10gb/s is no joke. maybe a few years from now, we will see intel as chips in those network gear from cisco, et al.
they are now focusing on wireless integration. few years from now, capacitors and resistors will be in a silicon chip. it is the future, and they are very lucky to realize that. when the economy recovers, intel will clearly be a winner.
and for the server, i would want to say this. i believe amd will produce good cpu. but that is just half of the story, amd is not emphasizing any good chipsets/system to come with it including support pci-x at 133mhz with hotplug slots, interleaved memory with chipkill(tm), good server management, good integration.
(as one who decides what to purchase in a server,) amd must make a lot of effort before i will take them seriously. their cpu is not enough for me to get their system, yet.
let's just wait and see, but i see that intel will always be a step ahead. now for amd, the challenge is to be at par or even be ahead of intel.
Live your life each day as if it was your last.
Your joke reminds me of the ancient Egyptian symbol for a large number. It was a man with his arms upraised as if saying it's incomprehensible. I think it was used for numbers larger than 1000 or 1000000.
Degaussing scares the bad magnetism out of the monitor and fills it with good karma.
Intel bought DEC? :)
-- Jim
Ok, maybe they aren't quite the same thing yet, but the lines between the two have REALLY blurred.
Just take a look at any modern RISC processor. Chances are it has several hundred instructions, ie they sure haven't "reduced" that instruction set by any significant amount. Than if you look at any modern CISC processor, you'll find that they just decode instructions into RISC-like ops internally. End result? The difference between RISC and CISC is REAL small these days.
If you read about the design of the Power4 vs. the Athlon, you'll see that essentially ALL of the basic building blocks are the same, it's mainly just a matter of how many of those blocks there are and how they all fit together. If anyone thinks that the Power4 is so fast clock for clock vs. the Athlon is because of it's instruction set, they probably just haven't looked to see that this chip has tons of execution units, HUGE cache and a shitload of bandwidth. All things that could potentially be added to a chip like the Athlon if the economics of such would fit.
Now, this isn't to say that x86 isn't without it's flaws, but most of those flaws are rather minor and have been worked around in compilers for years. The two biggest problems are the small number of registers and the stack-based floating point units. Well, Intel's SSE2 can now mostly replace the old floating point unit for the majority of tasks (though it typically isn't used as such yet), and AMD's upcoming Hammer/Operaton will double the number of registers available.
Well, they weren't lying about the clockspeed. Incidentally, I noticed large gains in applications which were dependant on floating point performance while using that processor (as opposed to a friends 486dx4-100) as well as higher floating point in benchmarks. It wasn't as fast as a P90 ... not quite (there were upgrade chips based on that brand of CPU which claimed to be equivalent to a P75)
I had one for many years. I'd say it would compare favorably to a stock P75.
-- Jim
Sounds like next year might finally bring a worthy upgrade for my 486dx4-160
I love it when people who never used prepentium systems try to talk like they did.. Everyone knows that a dx4 ran at 100mhz.
They're saying that Barton will be here 1Q03, Sledgehammer is due 1H03, but now ClawHammer may be delayed until 2H03!
Arghh. I thought the point was to do a 64 bit CPU without requiring an Itanium schedule...
Why would anyone engrave "Elbereth"?
What do you want large caches for? Large caches aren't simply a trump card to be played to magically make all your applications faster. Misusing a cache can be detrimental to performance unless your cache is big enough to fit the entire application and all its data. Algorithmic enhancements can be used to make a huge difference... and in some situations, I can get higher performance out of a processor with 1/4 the cache size of another processor where a poorly written version of the application runs on certain types of data. All that being said, yes, larger caches will improve the execution of the majority of software (and certainly of algorithmically tuned software) but it is not the end-all, be-all solution.
The ultimate mobile processor should have a power saving mode that runs slower and won't burn your lap. My main prob with laptops is that you can no longer use them on your lap. They run too hot. This is of course due to the CPU, RAM and hard drive (maybe cdrom if spinning). But the CPU is on the most and runs the hottest of all those. They only put 4200 or 5400 rpm HDs in those machines so the HD can't get as hot as the CPU seems to get. :)
Course it should also have a mode that burns through the case, but gets you those extra fragging frames on Q3
the amd 486dx4-120 (which ran at 40mhz *3 ) was a great overclocker, and a whole LOT of people overclocked em to 40mhz * 4 = 160mhz.
Lawyers, MBA's, RIAA? A jedi fears not these things!
I hate it whenever Mac-heads point to PPC and show how its such a great example of RISC that runs "all you're programs 2x as fast as the fastest Pentium4!" In all reality, the PowerPC line (not necessarily the POWER line) are very unimpressive. These days, a 1.25 GHz Alpha can still hold its own against a 2.5 GHz P4 in terms of floating point power. Yes, the same Alpha that has been neglected for the last half-decaded whose design has stagnated since the 21264 and whose process technology is antique compared to AMD's and Intel's. But the Alpha still keeps kicking x86 in the head. Yet, the PowerPC, running at the same 1.25 GHz, backed by the dual giants Motorola and IBM, built with leading edge copper fab technology, the second most common desktop RISC architecture (after x86 :) shipping in every single Apple computer isn't even competitive with the P4. Damn you DEC! Damn you to all hell!
A deep unwavering belief is a sure sign you're missing something...
I still have one of those kicking around on a Biostar 8433UUD...it's not currently installed in anything, though. For $350 (processor & motherboard) in late 1995/early 1996 (?), it was a deal. It outran a P5-133 Packard Bell at work (not too surprising, since the Packard Bell had no L2 cache and sh*tty onboard video vs. the #9 Motion 531 I had at home). The only downside was that the 40-MHz FSB of the 5x86-120 meant that the PCI bus had to be underclocked (to 26.67 MHz) to keep things stable. I suppose I could've tried running the processor at 133 MHz (4x33) instead of 120 (3x40), but slower access to the L2 cache would probably have made performance about the same.
20 January 2017: the End of an Error.
What I want to see is how it handles memory intensive benchmarks. I think this may be where it will shine, with the DDR interface built directly into the processor, thus eliminating latency and bottlenecks imposed by the north bridge.
The other big advantage most people seem to forget is the amount of memory addressing capability. Where I work, we have racks of Linux X86 servers with 6GB of memory each. While there are hacks to go beyond 4GB, it gets kind of ugly. With Opteron, addressing 6GB or more of memory is not a problem.
Also, with their Hypertransport bus and supporting multiple processors, the amount of memory scales with the number of CPUs.
-Aaron
This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
Clarification: AMD asks us to point out that Hammer schedules haven't slipped from its previous advice, as we originally suggested in this article. A spokesman from the company told us that desktop versions of Hammer are still planned to ship (for revenue) in Q1 2003 with systems on shelves at the turn of Q1 2003, not the second half of 2003 as we stated.
Why would anyone engrave "Elbereth"?
No. The containers consist of an unsafe "core" part that works with void* pointers, wrapped in a typesafe template.
// note: append() is implemented out-of-line
The idea is roughly like this:
struct CoreListElement {
struct CoreListElement *next;
};
class CoreList {
void append(struct CoreListElement *elemnt);
};
template < typename T >
class List {
struct ListElement : public CoreListElement {
T value;
};
CoreList core;
void append(const T& t) {
core.append((CoreListElement*) new ListElement(t));
}
};
I also have Array<T> (vector), Map<T>, etc. These work almost exactly like STL containers, except most of the code is out of line so compiles are MUCH faster and binaries are MUCH smaller.
My containers have the added advantage that you can embed ListElements in a structure or class to avoid allocating extra memory when inserting into a list or hashtable.